Now you can Instantly Chat with Nageswara reddy!
About Me
Lead Data Scientist with 15+ Years of total IT experience, Machine Learning, Deep learning and AI with R & Python (4.10 Years), Big Data (Map Reduce, PIG & Hive) (1.5 Years) & Java. About 4.10 years is into Banking (BFS) domain, rest in Retail...
Show MoreSkills
Portfolio Projects
Description
Project#1 CASA Loss prediction
For the bank one of the cheapest money is from saving and current account balances (CASA balance). Need to identify the customers whose CASA balance loss is 60% and above, in next 6 months. So that bank can target those customers to take the corrective actions to avoid the CASA balance loss. May 2018 to Oct 2018 period data was used to train and Nov 2018 to Apr 2019 period data was used to validate the model. 6 Months window period of data has been used for validation purposes. Model was able to capture 52% of leads in top 3 deciles. Top 5 predictors are Total AUM, Cash Inflow, Cash outflow, tenure and Debit transactions customer’s trend of saving balance in the last 6 months.
Client: 2nd largest bank in Malaysia.
Responsibilities:
Done exploratory analysis and took inferences by visualization the data
Removed the insignificant variables using Dimensionality reduction technique
Used Logistic Regression, Random forest and XGBoost.
Achievements:
Identified the 2.6 Billion RMB CASA loss over the next 6 months. Bank has acted to save this amount by targeting those customers with the offers such as Bonds to retain the CASA balance.
Environment/Technology Stack: Citrix Server, Big Data environment, Hue for Hive & Jupytor Notebook.
Description
Project#2 Customer FD Price Sensitivity.
The bank wants to know who the price insensitive customers are. So that bank can provide better (Preferred) interest rate to price insensitive customers than normal (Baud rate) interest rates. Also they want the FD propensity model to predict who will take their next FD either the existing or new customer. Jan 2018 to Dec 2018 period data was used to train and Jan 2019 to Feb 2019 period data was used to validate the model. 2 Months data has been used for validation purposes. Model was able to capture 46% of leads in top 3 deciles. Top 5 predictors are Average debit online transactions last 1 year, Age, Customer Segment, Customer’s SA balance mean of initial 6 months to the mean of next 6 months and Average branch debit transactions last 1 year. Business can just take top 3 deciles and identify 46% of customer’s price Insensitivity. We can target an outflow of RM4.8 Billion in FD balances as of 2018.
Client: 2nd largest bank in Malaysia.
Responsibilities:
Done exploratory analysis and took inferences by visualization the data
Removed insignificant variables using Dimensionality reduction technique
Used Logistic Regression, Random forest and XGBoost.
Random forest gave the best results.
Environment/Technology Stack: Citrix Server, Big Data environment, Hue for Hive & Jupytor Notebook.
Show More Show LessDescription
Project#3 Inbound Call Depletion
The bank wants to reduce the number of inbound calls for the call center. The inbound calls are of Debit Card, Savings Account, Current Account and Credit card reasons. From Feb 2016 to Feb 2019 ( 3 Year’s) period data was used to train and Apr 2019 data was used to validate the model.Top 5 predictors are Tenure, Total Savings Account balance, Active click user, Age Group and Transaction amount on day2 were used to build model. One month data has been used for validation purposes. Model was able to capture 61% of leads in top 3 deciles.
Client: 2nd largest bank in Malaysia.
Responsibilities:
Done exploratory analysis and took inferences by visualization the data
Removed insignificant variables using Dimensionality reduction technique
Used Logistic Regression and Random forest.
Random forest gave the best results.
Environment/Technology Stack: Citrix Server, Big Data environment, Hue for Hive & Jupyter Notebook.
Show More Show LessDescription
Project#4 Transaction Fraud Detection
Bank transactions data are explored and drawn the insights. The label data is provided for the given transactions. The model learns the fraud patterns and able to predict the given transaction is fraud or not.
Responsibilities:
Done exploratory analysis and took inferences by visualization the data
Removed insignificant variables using Dimensionality reduction technique
Used Logistic Regression and XGBoost.
Environment/Technology Stack: Windows Server 2012 & Machine Learning with Python
Show More Show LessDescription
Project#5 Credit card Sanction Predictor
The applicant’s and credit bureau data is available. The model is to be developed to predict whether the customer will repay the credit card bill or get default. Helped a Bank in deciding to sanction the credit card to the applicant or not based on the given Domestic and Credit bureau data.
Responsibilities:
Done exploratory analysis and took inferences by visualization the data
Removed insignificant variables using Dimensionality reduction technique
Used Logistic Regression and Random forest models.
Environment/Technology Stack: Windows Server 2012 & Machine Learning with Python
Description
Project#6 Telecom Churn
Identified churning of customers by analysing customer’s data. Identified best model out of KNN, Naive Bayes and Logistic.
Responsibilities:
Explored the data using different visualization techniques.
Improved the quality of data by removing inconsistent data, missing values & outliers.
Used algorithms like KNN, NAIVE BAYES and Logistic Regression.
Environment/Technology Stack: Windows XP & Machine Learning with R.
Description
Project#7 Spark Funds Investment
Helped Spark Funds Investment Company to identify the geographies and sector for its investment to maximise the returns in start-up eco system.
Responsibilities:
Extracted the data from the client and made an understanding
Made exploratory analysis and cleansed the data
Identified, top Countries with High Investments
Identified, top sectors to invest.
Environment/Technology Stack: Windows XP & Machine Learning with R.
Description
Project#8 SPC
This application is a widely implemented strategy for managing the sales & logistics. It involves using technology to organize, automate and synchronize business processes principally logistic activities, but also for sales. The overall goals are to deal efficiently with logistic and sales process and reduce the costs involved for logistics & Sales. It includes a management system for tracking and recording every stage in the sale & logistics from initial logistics to final sales.
Responsibilities:
Worked on a live 20 nodes Hadoop cluster running CDH.
Extracted the data from Oracle RDBMS into HDFS using Sqoop.
Created and worked with Sqoop jobs to populate Hive External tables.
Developed Hive scripts for end user / analyst requirements to perform ad hoc analysis.
Developed Oozie workflow for scheduling the ETL process.
Environment/Technology Stack: RHEL, Hadoop, HDFS, MapReduce, Hive and Hbase.
Show More Show Less