About Me
Having 6.8 years of IT experience in Development & Enhancement of Data warehouses and have worked on Big Data applications using various tools and technologies like HDFS, Map Reduce, Hive, Sqoop, Impala, Python, ...
Show MoreSkills
Web Development
Data & Analytics
Development Tools
Programming Language
Database
Operating System
Others
Software Engineering
Positions
Portfolio Projects
Company
Project High rise - Its a Health insurance project worked for Humana. my role is Big data developer
Tools
hue Jupyter NotebookCompany
Mortgage Servicing Point, Bank of America
Description
- Ingest the data from the various RDBMS systems and Kafka streaming data from various producers to loading zone.
- Integrated Kafka and Spark to load the streaming data to Spark Data Frames.
- Experience in building streaming/real time framework using Kafka & Spark.
- Implemented to reprocess the failure messages in Kafka using offset id.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
- Perform data quality checks in loading zone using Pyspark.
- Load the data to raw zone (Hadoop HDFS, hive) after data quality check.
- Perform standard data cleansing and data validations and place the data to trusted zone.
- Developed Pyspark programs for applying business rules on the data.
- Creation hive partition tables and load the parquet data into hive tables.
- Create impala views for business users and analysts to consume the data on refined zone
Company
BAC Mortgage data HUB, Bank of America
Description
BAC Mortgage data HUB is the enterprise-wide data warehouse that stores data of all Mortgage loans which are Non-ADS loans. We need to process the mortgage loans data and push the data into tables for end users.
Responsibilities:
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Creating Hive and UNIX scripts to get the data from SOR Files and pushing into HDFS and then processing in hive.
- Developed map reduce programs for applying business rules on the data.
- Creation hive partition tables and load the parquet data into hive tables.
- Developed PySpark code for applying business rules on the data.
- Worked on implementing CDC logics in Spark SQL.
- Streaming of logs from the server and captured the important details in hive tables.
- Working on yarn logs to get the better understanding of the logs in spark jobs. It helped the team to get rid of reaching out to the cloudera link for logging requirements.
- Tested raw data and executed performance scripts.
- Supported code/design analysis, strategy development and project planning.
- Developed a tool to sync up the hadoop application code from production environment to lower environment as part process improvement and automation.
Skills
Big Data Hadoop PySpark InformaticaTools
hue Informatica PowerCenter