About Me
Data Engineer with more than 5 years of work experience in Analytics, Research AI and Pre-Sales in IT and Wholesale-Ecommerce domain. Proven track record of initiating and delivering successful assignments of Business Analysis, Product Delivery & Tec...
Show MoreSkills
Portfolio Projects
Description
Tools Used : Python, MYSQL, AWS, Pyspark In this project, we have developed a recommendation engine based on both Association Rule Learning (ARL) & Collaborative Filtering in order to curate the posts according to the users specialties. Here we have created an end-to-end data pipeline that does the whole ETL process and on top of it, we have used our in-house modeling technique in order to get the results we need. Here we have taken an average lift score between 0.3 - 0.4
Show More Show LessDescription
In this project, we classified a huge set of audios (~ 8000) of 4 classes which include different vehicles, groups of men, etc. Used extensive feature engineering techniques and also feature selection techniques to select and derived features in order to train our model. I have used XG-Boost Classifier as the data has too much variance. Finally got an accuracy score of 85.6% & prepared an end-to-end project with UI in order to record audio from the user & predict whether it is from one class (4 classes of vehicles).
Show More Show LessDescription
In this project, we predicted whether or not a donor will give blood the next time. By using RFM (Recency, Frequency, Monetary) model we predicted our target customers who donate the blood next time. Based on 5 features given using TPOT (an automated ML Tool) and Logistic Regression. We had also done feature engineering. Overall accuracy achieved was 79% using TPOT based pipeline and 78.90% using Logistic Regression.
Show More Show LessDescription
The dataset collects data from a wearable accelerometer mounted on the chest. Uncalibrated Accelerometer Data are collected from 15 participants performing 7 activities. The dataset is intended for Activity Recognition research purposes. It provides challenges for identification and authentication of people using motion patterns. Anomaly Detection by using zscore and EllipticEnvelope as the data is normally distributed and multimodal. Percentage of outliers in the whole data= 10% Accuracy of KNN Classifier is 79.63%, Decision tree Classifier is 73.62% and Random Forest Classifier is 78.84% on the Test data. From K-Fold Cross validation, KNN is performing better in the training set as it is showing less variance in the accuracy score.
Show More Show LessDescription
In this project, we are predicting the hydraulic system failure 15-Minutes or 30-Minutes before in order to save the machine from total failure. Hereby, we're monitoring the condition of Hydraulic Systems. Overall accuracy achieved was 93.57% along with roc_auc_score of 96.58%.
Data: Open source data(604 MB) /Custom data to be made by our team too.
Description
In this project, we have to submit a report on whether both the units of the organisation can be merged or not. By using numpy, pandas, seaborn, matplotlib, etc python libraries. Had done Data Preprocessing for making reports. Finally we got that both the units cant be merged together as both units have different structure of giving promotions, terminations, etc to their employees.
Show More Show LessDescription
Used : Python, MYSQL, AWS GLUE, AWS S3, Pyspark In this project we have a developed an recommendation engine based on both Association Rule Learning (ARL) & Collaborative Filtering in order to curate the posts according to the users speciliaties. Here we have created an end-to-end data pipeline which does the whole ETL process and on the top of it we have used our in house modelling technique in order get the results as we need. Here we have taken an average lift score between 0.3 - 0.4
Show More Show LessDescription
In this project, we classified huge set of audios (~ 8000) of 4 classes which includes different vehicles, group of men, etc. Used extensive feature engineering techniques and also feature selection techniques to select and derived features in order to train our model. I have used XG-Boost Classifier as the data has too much variance. Finally got an accuracy score of 85.6% & prepared end-to-end project with UI in order to record an audio from the user & predict whether it is from one each class (4 class of vehicles).
Show More Show Less