About Me
A dynamic professional with 12 years of professional experience in the fields of data engineering, data management, business intelligence, machine learning and data science. Currently working in solving business problems for global clients using a...
building analytical solutions using python - numpy, pandas, sklearn, nltk and gensim
building data pipelines using SQL server, ETL
SQL server (DDL,DML, advanced queries, Stored procedures), Advanced MS Excel, Power BI for dashboarding
Show MoreSkills
Data & Analytics
Others
Web Development
Programming Language
Database
Positions
Portfolio Projects
Exporing depth wise convolution operators in Kera
https://medium.com/@gaddepavan/how-to-use-depth-wise-separable-convolution-to-reduce-the-parameters-in-a-cnn-f537ba59874bCompany
Exporing depth wise convolution operators in Kera
Description
Convolution neural networks are the holy grail of image classification. However, they require high parameters (total number of weights) to train the model with decent accuracy. One way to reduce the overall parameters is to use depth wise separable convolution layers instead of direct convolution layers. This article explores the same idea and shows how to reduce the overall parameters by 4 times without reducing the accuracy
Keras - coding
Network design
Show More Show LessTools
PyCharmCompany
Fraud detection model for early fraud tagging in Motor and Property Insurance claims
Description
In the developed markets like Europe and North America, Insurance is highly penetrated and mandatory. This leads to very high fraud claims being launched in these markets. In this regard, an advanced analytics model was developed to detect fraud at early stage of claim life cycle in Motor and Property Insurance.
1. Techniques used - spatial analysis, anomaly detection, text mining using NLP, Link analysis
2. Tools used - Python pandas, NLTK, GENSIM, Networkx, Excel
3. Analysis - Advanced multi variate data analysis to identify the significant variables
4. Algorithms used - Random Forest, TF - IDF, LDA for Topic modeling
5. 30 % new fraud identified
Show More Show LessTools
PyCharmCompany
FNOL claim recovery opportunity prediction in UK motor insurance claims
Description
Recovery (Subrogation) Opportunity is generated in a claim when the insured vehicle is damaged by 3rd party's mistake and the insurance company has right to process the 3rd party insurance for recovering the damages incurred to their insured. Identifying this recovery opportunity in a given accidental circustances is very tricky as this could involve multiple version of statements from either parties involved in the accident. Using the handler notes captured during the claim life cyle and applying below techniques, recovery opportunity (Yes/No) flag was generated.
1. Natural language processing - TF-IDF for correlation - bi and tri grams , Word embeddings created using Word2Vec for context understanding
2. XG Boost technique
3. Tools used: Python pandas, NLTK, GENSIM, 're' package, FLASK
4. Data processing - Extensive data cleaning as a part of NLP - stop word removal, spell correction, stemming, lemmatization, special character removal
5. Accuracy achieved: 88%
Show More Show Less
Skills
Data Cleansing Data Science Flask PythonCompany
Automation of hire vehicle identification process from Injury Claim notification form
Description
Every personal injury claim has a claim notification form (CNF) generated by claimant. This is often sent in a pdf format containing 30 - 40 pages of information related to inury, claimant, accident and hire details. Going through all 30 - 40 pages and identifying hire vehicle is very tedious jobs for claim handlers. It takes 30 - 40 minutes for summarising entire details from these documents. The high impact aspect is the hire vehicle as the cost of not identifying whether a hire exists or not in a claim duing claim notification is very very high. Using python packages, the pdf is converted to text and this text is cleaned and processed to find hire related information. The project is automated such that the python script reads all the CNF's from a folder and identifies the hire and an excel file is updated with CNF name and hire status (Yes/No). The project is executed with 100 ?curacy.
1. Tools used: Python packages - glob3, datefinder, pypdf2, Excel, NLTK
2. Accuracy : 100 %
Show More Show Less
Tools
PyCharmCompany
Unified dashboard using Excel, SQL Server and PowerBI
Description
Merging 3 sub level dashboards each containing 4 - 5 micro level dashboards - into one single KPI dashboard
Features:
1. Single page summary to track imporant KPI's across buisness units (Inbound, Outbound and Backoffice) and sub processes (call center desks - Elite, Mass, New acquisition desk etc.)
2. Advanced ribbon tabs to dynamically select sub levels
3. Interactive charts to see and find quick insights
4. Very light dashboard - enabling to find insights with few clicks
5. Data pipeline desing in SQL Server - merging data from 15 different sources into single data base
- joining data from multiple sources - stored procedure implementation for automation
- script to download to automatically download and upload data to SQL Server - entire ETL process automated
- auto email trigger of key kpi's and auto refresh using command line scripts
Show More Show LessCompany
Identification of key drivers impacting high fiber count optical fiber cables
Description
A study was done to identify key drivers (product features) which impact the usage of high fiber count optical fiber cables
1. Data used: Primary research data - from academecians , secondary data - historic sales data with product type, fiber count, diameter, end user application
2. Conjoint analysis to identify the utilities of each driver - significance of each product feature
3. Tools used: SPSS
Show More Show LessSkills
OracleTools
Excel sheetsCompany
Customer churn prediction
Description
Customer churn is extremely painful for the organizations having spend lot of acquisition costs on the new consumers. Identifying potential customers who are likely to churn out of the existing product portfolio can help organization quickly understand, design customized solutions, address grievance if any and retain them thus sustaining the revenue. Below are the data and tools used to predict churn and multiple stages of the customer lifecycle for a leading DTH (Telecom) company
1. Customer information (base packs subscribed, demographics, socio economic status, interactions at call center)
2. Multiple models developed based on the stage of prediction - on the day of acquisition, 60 days after acquisition and after 6 months
3. Logistic regression used
4. Principal component analysis used to reduce the number of variables (~ 760 to < 100>
5. Variable transformation (scaling to normalize various numerical variables - ARPU, income, age etc)
Show More Show Less
Skills
PythonTools
PyCharmCompany
Motor Vehicle Total Loss prediction using image classification
Description
Based on the vehicle accident images, a classification model to identify motor total loss (non driveable) status is developed.
1. Data labeling: 5000 images are classified into Total loss and Non total loss - image size cropped to 224 x 224
2. Pretrained model used to train the image data
- Resnet 50
3. Tools used - Keras, Google Colab for GPU training
4. Accuracy achieved - 85%
Show More Show LessSkills
KerasTools
Google Cloud