About Me
Experienced Data Scientist/Data Solution Architect/ in designing and developing enterprise class system.
Hands on experience in data collection, feature extraction, feature collection, machine learning (SVM, Random Forest, Apiori, Regressi...
Hands on in Python and libraries like pandas, Scikit learn, matplotlib, numpy.
Hands on experience in working with Boosting Framework like XGBoost, LGBM
Hands on experience in end to end data pipeline creation in enterprise system using different ETL tools, Apache Flume, Kafka, Sqoop in In-premise and Cloud environment.
Hands on experience in No SQL Database like Mongo DB and RDBMS like SQL Server.
TOGAF 9.1 Certified.
Well versed with Agile/Waterfall methodologies, CMMI Level 5 process, Estimation techniques, Requirement Gathering and Elicitation, Design using UML techinques.
Hands on Expertise on data governance data lineage data processes DML and data architecture control execution, Master Data management (MDM), Metadata Management and Data Governance (DG).
Functional Domain Exposure: Oil and Gas, Insurance, GIS, e-governance, Enterprise Securities,
Show MoreSkills
Web Development
Data & Analytics
Development Tools
Programming Language
Database
Others
Positions
Portfolio Projects
Company
Manage Detection and Response
Role
Full-Stack Developer
Description
Machine Learning: Light GBM and Support Vector Machine, Neural Network like CNN, RNN
Technology: Python, Apache Spark, Pytorch, Apache Kafka, Flume, MongoDB, Jenkins, Git
Supported Platforms: Ubantu 17.0, Cent OS
Team Size: 12
Duration: Dec 2017 to June 2019
Overview:
Manage Detection and Response is a combination of technology and skills to deliver advanced threat detection, deep threat analytics, global threat intelligence, faster incident mitigation, and collaborative breach response on a 24x7 basis.
The endpoints (IOT devices and Enterprise servers) have system scanning done by EMS system or scan component. Apache Flume agents capture the logs and sends to Topic in Apache Kafka.
The queue is consumed using Apache Spark Steam component. The data is reduced and stored in Cassandra for Machine Learning.
The data is processed using Machine Learning for threat detection. The output is stored in Mongo DB and displayed in dashboard.
Machine Learning Algorithm techniques like Ensemble Learning & Boosting like Support Vector Machine, Light GBM, CNN, RNN are applied to best possible result is derived.
Role & Responsibilities:
· Part of Product Architecture Team.
· Leading and development of data ingestion, log processing component using Apache Spark/Flume, Kafka and HDFS and MongoDB
· Feature Selection and Engineering for Web Attack, Network Attack, Malware Attack. Light GBM and Support Vector Machine, Neural Network like CNN, RNN
· Collaborating with agile with cross-functional teams
Show More Show LessCompany
EDPR Machine Learning
Role
Full-Stack Developer
Description
Machine Learning: Light GBM and Support Vector Machine, Neural Network like CNN, RNN
Technology: Apache Spark, Python, Apache Kafka, Flume, MongoDB, Jenkins, Git
Supported Platforms: Windows and Linux.
Team Size: 8
Duration: Jan 2017 to Nov 2017
Overview:
Endpoint Detection and Protection Response detects, protects and responds to cyberattacks which adds to the complexity of securing the enterprise. Each of the point products adds an agent to the endpoint and is often managed independent of the other security technologies present on that endpoint.
Machine Learning: This involves Feature Extraction and Feature Engineering for malware based on Static Analysis for PE and PDF file types.
The metadata is extracted from malware samples. Thereafter, Data Pre-Processing, Data Cleaning is done.
Based on exploratory analysis, regularly model is created/updated and validated.
Machine Learning Algorithm techniques like Ensemble Learning & Boosting like Support Vector Machine, Light GBM are applied to best possible result is derived.
Role & Responsibilities:
· Part of Product Architecture Team.
· Model Creation, Data Pre-Processing, Data Cleaning.
· Feature Selection and Engineering.
· Implementing Machine Learning Algorithm techniques like Ensemble Learning & Boosting like Support Vector Machine, Light GBM are applied to beast possible
result is derived.
· Collaborating with fast-paced, agile, dynamic environment with cross-functional teams
Show More Show LessSkills
Machine Learning PythonTools
Git Jupyter NotebookCompany
IGA Data Analytics
Role
Full-Stack Developer
Description
Machine Learning: Light GBM and Support Vector Machine
Technology: Apache Spark, Python, Apache Kafka, Flume, MongoDB, Jenkins, Git
Supported Platforms: Windows and Linux.
Team Size: 4
Duration: Apr 2017 to Dec 2017
Overview:
IGA is integrated access management and governance product which takes care entire life cycle of employee engagement (on boarding and exit). During On boarding, employee id is created, access to different system is given after approval. During exit, all the access and id are revoked.
Machine Learning
The data collected from multiple system like attendance system, leave portal, access management, training system, appraisal system and other client multiple system.
The collected data cleansed, parsed, validated and thereafter feature selection and engineering, exploratory data analysis are applied to derive multiple metrics. Powerful dashboard is created using Tableau.
Machine Learning Algorithm techniques like Ensemble Learning & Boosting like Support Vector Machine, Light GBM are applied to best possible result is derived.
Role & Responsibilities:
· Part of Product Architecture Team.
· Model Creation, Data Pre-Processing, Data Cleaning.
· Feature Selection and Engineering.
· Implementing Machine Learning Algorithm techniques like Ensemble Learning & Boosting like Support Vector Machine, Light GBM are applied to beast possible
result is derived.
· Collaborating with fast-paced, agile, dynamic environment with cross-functional teams
Show More Show LessSkills
Machine Learning PythonTools
Git Jupyter NotebookCompany
RBC detection in blood
Role
Full-Stack Developer
Description
<!--[if !supportLists]-->· <!--[endif]-->Machine Learning: YOLO, Neural Network,Google Colab
<!--[if !supportLists]-->· <!--[endif]-->The project is to detect the red blood cells in blood sample. The training of data was done from the below dataset: Https://github.com/cosmicad/dataset
<!--[if !supportLists]-->· <!--[endif]-->The dataset contains blood images and annotated files for training
<!--[if !supportLists]-->· <!--[endif]-->YOLO algorithm was used for training the dataset. YOLO is an extremely fast real time multi object detection algorithm. YOLO actually looks at the image just once by dividing the image into a grid of 13 by 13 cells. Each of these cells is responsible for predicting 5 bounding boxes which describes the rectangle that encloses an object. YOLO also outputs a confidence score that tells us how certain it is that the predicted bounding box actually encloses some object. This score doesn't say anything about what kind of object is in the box, just if the shape of the box is any good. For each bounding box, the cell also predicts a class. This works just like a classifier: it gives a probability distribution over all the possible classes. YOLO was trained on the PASCAL VOC dataset. The confidence score for the bounding box and the class prediction are combined into one final score that tells us the probability that this bounding box contains a specific type of object. Since there are 13×13 = 169 grid cells and each cell predicts 5 bounding boxes, we end up with 845 bounding boxes in total. It turns out that most of these boxes will have very low confidence scores, so we only keep the boxes whose final score is 30% or more (you can change this threshold depending on how accurate you want the detector to be)
Show More Show LessSkills
Machine Learning Neural Networks YOLOTools
Jupyter Notebook