About Me
My top skills are Hadoop, spark, scala and I am certified in Hadoop & spark in both Hortonworks and cloudera distributions.
... Show MoreSkills
Software Engineering
Web Development
Development Tools
Data & Analytics
Others
Programming Language
Database
Operating System
Graphic Design
Positions
Portfolio Projects
Company
Renault (R-solve)
Description
R-Solve are sales management system, where we find the sales from the leads and leads from the contacts. It is used to monitor sales on leads.
There are different ways a lead can be created. Lead forms can be created by the customers, customer relation managers, showrooms, auto shows etc.
Lead management tool manages the different leads and is mapped to the different lead sources. This tool sends the leads the leads to respective dealers. The dealer can take this lead and further contact the customer to convert into sales.
Show More Show LessSkills
Agile Software Development Apache Maven Apache Spark Atlassian Confluence Jira Big Data CI/CD Cloudera Data Cleansing Data Dictionary Data Ingestion Digital Engineering ElasticSearch Git Hadoop Hive IntelliJ IDEA Jenkins Oozie Scala Scrum Framework Unit Testing WorkFlow ZeplinTools
Apache HiveCompany
mFISH
Description
Project Name : mFisH (Multi Fluorescent in situ Hybridization)
Client : GSK.
Environment : PySpark, Hive, PyCharm, shell script, AWS S3, Airflow and Agile (Scrum)
Role : Hadoop and Spark Developer.
Description:
mFish Cell Line Stability Network provides deep learning workflow to confirm the production stability of a cell line. It has basically two parts ML pipeline, performs deep learning on images and create thumbnails and ETL pipeline that copying data & images to RDIP Hive & Object store.
Roles and Responsibilities:
- Creating Shell script for ETL pipeline to connect and import CSV files to Hive tables.
- Creating Hive DBs and tables and querying the required information.
- Created PySpark script using python.
- Storing thumbnails to AWS S3 bucket using AWS CLI.
- Created DAGs using Airflow workflow scheduler.
Company
beacon
Description
Project Name : Beacon Image processing tool
Client : GSK.
Environment : PySpark, Hive, PyCharm, Pandas, NumPy, Sqlalchemy, Airflow and Agile (Scrum)
Role : Hadoop and Spark Developer.
Description:
Beacon Imaging tool will enable scientists to reliably process their experiments and save considerable time in the production of monoclonal antibody producing cell lines.
The beacon instrument is a core element of the cell line development process, it uses sophisticated technology to reduce CLD workflows from 13 weeks to 8 weeks.
Roles and Responsibilities:
- Creating python scripts using PyCharm IDE.
- Used different Python libraries like pandas, NumPy, CV2
- Creating Hive DBs and tables and querying the required information.
- Created pipeline script using python.
- Created DAGs using Airflow workflow scheduler.