About Me
8.5 years of experience in IT with experience in analyzing complex problems and translating them into scalable and efficient Data Science problems. Experience in Machine Learning, Deep Learning, Natural Language Processing and Big Data Engineering. E...
Show MoreSkills
Positions
Portfolio Projects
Description
Provide PR servicing by giving offers to credit cards customers who are inactive by routing them to PR agents when they call Citi. 2 Models were built for Responders and Control groups. Both models together will be used to determine whether the inactive callers should be redirected to PR agents. R Squared values for responder model is 40% and for Control Model is 35%.
Show More Show LessDescription
Developed PySpark jobs which reads data from AWS S3, transforms the data and writes to S3 as Parquet files, creation of user defined functions in PySpark. Performance Optimization for existing scripts. Built a Streaming application using Spark and Kafka. Rest APIs implementation using Python. Data pipelines creation using Airflow
Show More Show LessDescription
Built a Credit Engine which calculates Probability for loan defaulting for SME Customers. This Credit Engine receives data from external/internal APIs and processes the data. This data is provided as input to the classification model where probability for defaulting is calculated.
Show More Show LessDescription
Developed a Credit Score Engine to calculate credit score variables for SME Customers. This CreditEngine receives data from external/internal APIs and processes the data and decides whetherloan would be approved/rejected for the Customers.Created parsers for and XML input files.Processing of data done using Pandas and Numpy libraries.Developed REST API using flask.Improved performance by implementing asynchronous programming to process data fromdifferent APIs.Implementation of Data Load in MariaDB Columnar store.
Show More Show LessDescription
Development of a supply chain finance model which predicts whether finance should be provided tosellers. This model takes the inventory details, account receivables and account payables data asinput.Creation of Data Model from OFBiz erp model.Data ingestion into Hadoop using Sqoop.Logistic Regression is used to evaluate the credit risk of SMEs.
Show More Show LessDescription
Design and development of web scrapers to extract text from Singapore and Hong Kong customswebsites.Implemented Web Crawlers using Scrapy framework in Python.Deployment of Web Crawlers in AWS to have rotating proxies which prevents blockingof web crawlers by websites.
Show More Show LessDescription
Identify the profile of customers who have propensity to lapse the insurance policies.Initial model was developed for product categories - Term Life, Whole Life andUniversal Life Policies.Level Premium policies out of three major product categories -Term Life, Whole Life andUniversal Life Policies were identified. Alternate approach is model development with reclassified fourproduct categories as Level Premium Period, Term Life, Whole Life and Universal Life.This has brought significant improvement in accuracy of model.Algorithms and Language:Decision trees were used and derived the rules in R.Logistic regression was used to predict the churn of customers with probability in R.
Show More Show LessDescription
Provide a solution to analyze agent performance based on several attributes like demography,products sold, new business, etc. The goal is to improve the existing knowledge used for agentsegmentation in a supervised predictive framework and to predict the Policy inforce Quantity.Approach:Univariate and Bivariate analysis of different variablesHandling of outliersfeature engineeringSummary stats by agencyModel BuildingAlgorithms implemented in Python:Decision treesNeural Networks.
Show More Show LessDescription
A Health Care provider follows a ticketing system for all the telephonic calls received across all thedepartments where the Calls can be for New Appointment, Cancellation, Lab Queries, Medical Refills,Insurance Related and General Doctor Advice etc. The challenge is, based on the Text in the Summaryand Description of the call; the ticket is to be classified to Appropriate Category.Approach:Cleaning the data which involves converting to required formatCorpus creationPre-ProcessingDocument Term Matrix creationSplitting the data into Train, Validation and Test Datasets and applying the models on Traindataset and validating on Validation dataset.Algorithms used in R:SVMRandom ForestNaive Bayes
Show More Show LessDescription
This project for a telecom client involves processing of Monthly Bills for Fixed line customers andgenerating PDF files of Mobile Bills to the end users.Developed Informatica mappings, enabling the extract, transport and loading of the data intotarget tables.Analyzed, designed, developed, implemented and maintained moderate to complex initial loadand incremental load mappings to provide data for enterprise data warehouse.Worked with Memory cache for the better throughput of sessions containing Rank, Lookup,Joiner, Sorter and Aggregator transformations.Responsible for migrating project between environments (Dev, QA, UAT, Prod)
Show More Show Less