Now you can Instantly Chat with Vishal!
About Me
6.5+ yrs of extensive IT experience Development, Support, Implementation of software applications, Data pipelines and Datalake. includes 6+ years of experience in Big Data using Hadoop, Hive, Sqoop, Spark, Scala, Spark-Streaming, KAFKA, Oozie, ZooKee...
Show MoreSkills
Portfolio Projects
Description
Description:
Principal Financial Group is an American global financial investment management and insurance company headquartered in Des Moines, Iowa, U.S.A. Company deals with Four segments comprise the company: Retirement and Income Solutions, Principal Global Investors, Principal International, and U.S. Insurance Solutions.
Roles & Responsibilities:
- Responsible for building Aws Data Reservoir using services AWS CDK, Lambda, Glue, Pyspark, Athena and S3.
- Playing a role of Cloud Specialist to building a AWS Data Reservoir.
- Building the Aws Services and Data pipelines with the help of AWS CDK with Python.
- Handling the Data coming from different API’s to S3 Ingestion and Curation.
- Used Pyspark with Glue for ETL operations and Athena for reporting.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Leading and Mentoring a team to achieve a Data Reservoir as a low-cost solutions compare to traditional DataMart’s.
Description
EarlySalary is an innovative lending platform that will change the way loans in India are taken. Being a new age online brand, we bring together new credit scoring systems for superior customer profiling.
Roles & Responsibilities:
- Responsible for building Aws Datalake using services Kinesis, Glue, Athena and S3.
- Playing a role of Data Architect to building a AWS Datalake.
- Handling Realtime and Batch Data coming from different systems like App-streams, PostgresSqL, MongoDb, Mysql and ftp servers.
- Used Pyspark with Glue for ETL operations, Kinesis streams for realtime streaming and Athena for reporting.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Leading and Mentoring a team to achieve a Datalake as a low-cost solutions compare to traditional DataMart’s.
Description
Blue Dart Express or Blue Dart is an Indian Logistics Company providing courier delivery services headquartered in Chennai, Tamil Nadu. It has a subsidiary cargo airline, Blue Dart Aviation that operates in South Asian countries.
Roles & Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Ingesting Data form Golden Gate to HBase with Kafka-Spark Streaming.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Parquet and HBase.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Developed Scala scripts, UDFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 & 2.3 for Data Aggregation, queries and writing data into Elastic Search.
- Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
- Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL and Data Frames.
Description
Description:
Health Care Service Corporation (HCSC) is the largest customer-owned health insurer in the United States. Health Care Service Corporation, a Mutual Legal Reserve Company (HCSC), offers a wide variety of health and life insurance products and related services, through its operating divisions and subsidiaries.
There are N Number of Membership Source Systems of around 16 Sources.
Roles & Responsibilities:
- Data mapping in Data Lake layers
- Streaming data form Kafka with Spark Streaming as Consumer.
- Creating Hive Managed and External current tables form source tables .
- Writing Spark-Scala transformations for final gold table.
- Creating Dataframes for loading data into gold table form temp table.
- Writing Hive merge script for maintaining updated records.
- Worked in Development, QA, Preprod and Production Environments
- Providing guidance and mentorship to team members to write Spark-scala transformation and resolving issues related to project.
Description
Description:
This project is related to credit cards and Loan departments of a Bank. We get transactional data for cards and Load departments and approximately 40 to 60 GB data we are getting on daily basis. The objectives of this project are to analyze data before giving a new Loan or Card to the customers, Follow-up to upgrade cards of existing customers based on their payment records and types of expenses. NPS accounts follow-up based on business rules written in reports.
Roles & Responsibilities:
Data mapping in Data Lake layers
- Migration from Oracle in Hive using SQOOP
- Involved in creating tables, partitioning, bucketing of table and creating UDF’s in Hive.
- Experience with Hive Queries Performance Tuning
- Implemented custom UDF in Hive
- Scheduling a job in Oozie workflow
- Monitoring the different jobs on daily basis.
- Worked in Development, QA, Preprod and Production Environments