Pavan G.

Pavan G.

Data science professional with experience in building machine learning models for various sectors

Bengaluru , India

Experience: 12 Years

Pavan

Bengaluru , India

Data science professional with experience in building machine learning models for various sectors

48000 USD / Year

  • Immediate: Available

12 Years

Now you can Instantly Chat with Pavan!

About Me

A dynamic professional with 12 years of professional experience in the fields of data engineering, data management, business intelligence, machine learning and data science. Currently working in solving business problems for global clients using a...

building analytical solutions using python - numpy, pandas, sklearn, nltk and gensim

building data pipelines using SQL server, ETL

SQL server (DDL,DML, advanced queries, Stored procedures), Advanced MS Excel, Power BI for dashboarding

Show More

Portfolio Projects

Company

Exporing depth wise convolution operators in Kera

Description

Convolution neural networks are the holy grail of image classification. However, they require high parameters (total number of weights) to train the model with decent accuracy. One way to reduce the overall parameters is to use depth wise separable convolution layers instead of direct convolution layers. This article explores the same idea and shows how to reduce the overall parameters by 4 times without reducing the accuracy

 

Keras - coding

Network design

Show More Show Less

Skills

Keras Python SQL

Tools

PyCharm

Fraud detection model for early fraud tagging in Motor and Property Insurance claims

Company

Fraud detection model for early fraud tagging in Motor and Property Insurance claims

Description

In the developed markets like Europe and North America, Insurance is highly penetrated and mandatory. This leads to very high fraud claims being launched in these markets. In this regard, an advanced analytics model was developed to detect fraud at early stage of claim life cycle in Motor and Property Insurance.

1. Techniques used - spatial analysis, anomaly detection, text mining using NLP, Link analysis

2. Tools used - Python pandas, NLTK, GENSIM, Networkx, Excel

3. Analysis - Advanced multi variate data analysis to identify the significant variables

4. Algorithms used - Random Forest, TF - IDF, LDA for Topic modeling

5. 30 % new fraud identified

Show More Show Less

Tools

PyCharm

FNOL claim recovery opportunity prediction in UK motor insurance claims

Company

FNOL claim recovery opportunity prediction in UK motor insurance claims

Description

Recovery (Subrogation) Opportunity is generated in a claim when the insured vehicle is damaged by 3rd party's mistake and the insurance company has right to process the 3rd party insurance for recovering the damages incurred to their insured. Identifying this recovery opportunity in a given accidental circustances is very tricky as this could involve multiple version of statements from either parties involved in the accident. Using the handler notes captured during the claim life cyle and applying below techniques, recovery opportunity (Yes/No) flag was generated.

 

1. Natural language processing - TF-IDF for correlation - bi and tri grams ,  Word embeddings created using Word2Vec for context understanding

2. XG Boost technique

3. Tools used: Python pandas, NLTK, GENSIM, 're' package, FLASK

4. Data processing - Extensive data cleaning as a part of NLP - stop word removal, spell correction, stemming, lemmatization, special character removal

5. Accuracy achieved: 88%

 

Show More Show Less

Tools

Numpy PyCharm

Automation of hire vehicle identification process from Injury Claim notification form

Company

Automation of hire vehicle identification process from Injury Claim notification form

Description

Every personal injury claim has a claim notification form (CNF) generated by claimant. This is often sent in a pdf format containing 30 - 40 pages of information related to inury, claimant, accident and hire details. Going through all 30  - 40 pages and identifying hire vehicle is very tedious jobs for claim handlers. It takes 30 - 40 minutes for summarising entire details from these documents. The high impact aspect is the hire vehicle as the cost of not identifying whether a hire exists or not in a claim duing claim notification is very very high. Using python packages, the pdf is converted to text and this text is cleaned and processed to find hire related information. The project is automated such that the python script reads all the CNF's from a folder and identifies the hire and an excel file is updated with CNF name and hire status (Yes/No). The project is executed with 100 ?curacy.

1. Tools used: Python packages - glob3, datefinder, pypdf2, Excel, NLTK

2. Accuracy : 100 %

 

Show More Show Less

Tools

PyCharm

Unified dashboard using Excel, SQL Server and PowerBI

Company

Unified dashboard using Excel, SQL Server and PowerBI

Description

Merging 3 sub level dashboards each containing 4 - 5 micro level dashboards - into one single KPI dashboard

Features:

1. Single page summary to track imporant KPI's across buisness units (Inbound, Outbound and Backoffice) and sub processes (call center desks - Elite, Mass, New acquisition  desk etc.)

2. Advanced ribbon tabs to dynamically select sub levels

3. Interactive charts to see and find quick insights

4. Very light dashboard - enabling to find insights with few clicks

5. Data pipeline desing in SQL Server - merging data from 15 different sources into single data base

    - joining data from multiple sources - stored procedure implementation for automation

    - script to download to automatically download and upload data to SQL Server - entire ETL process automated

    - auto email trigger of key kpi's and auto refresh using command line scripts

Show More Show Less

Identification of key drivers impacting high fiber count optical fiber cables

Company

Identification of key drivers impacting high fiber count optical fiber cables

Description

A study was done to identify key drivers (product features) which impact the usage of high fiber count optical fiber cables

1. Data used: Primary research data - from academecians , secondary data - historic sales data with product type, fiber count, diameter, end user application

2. Conjoint analysis to identify the utilities of each driver - significance of each product feature

3. Tools used: SPSS

Show More Show Less

Skills

Oracle

Tools

Excel sheets

Customer churn prediction

Company

Customer churn prediction

Description

Customer churn is extremely painful for the organizations having spend lot of acquisition costs on the new consumers. Identifying potential customers who are likely to churn out of the existing product portfolio can help organization quickly understand, design customized solutions, address grievance if any and retain them thus sustaining the revenue. Below are the data and tools used to predict churn and multiple stages of the customer lifecycle for a leading DTH (Telecom) company

1. Customer information (base packs subscribed, demographics, socio economic status, interactions at call center)

2. Multiple models developed based on the stage of prediction - on the day of acquisition, 60 days after acquisition and after 6 months

3. Logistic regression used 

4. Principal component analysis used to reduce the number of variables (~ 760 to < 100>

5. Variable transformation (scaling to normalize various numerical variables - ARPU, income, age etc)

 

Show More Show Less

Skills

Python

Tools

PyCharm

Motor Vehicle Total Loss prediction using image classification

Company

Motor Vehicle Total Loss prediction using image classification

Description

Based on the vehicle accident images, a classification model to identify motor total loss (non driveable) status is developed.

1. Data labeling: 5000 images are classified into Total loss and Non total loss - image size cropped to 224 x 224

2. Pretrained model used to train the image data 

    - Resnet 50

3. Tools used - Keras, Google Colab  for GPU training

4. Accuracy achieved - 85%

Show More Show Less

Skills

Keras

Tools

Google Cloud