Vasu B.

Data Engineer(Spark + BigData)

Nagpur , India

Experience: 9 Years

Vasu

Nagpur , India

Data Engineer(Spark + BigData)

53382.4 USD / Year

Start Date / Notice Period end date: 2022-08-03

9 Years

Now you can Instantly Chat with Vasu!

Chat Now

About Me

Working as Data Engineer and BI Analyst for a leading firm. I have 6 years of experience in developing data warehouse, data marts and analyzing data for Business intelligence reporting and forums. My technical stack includes Hadoop, Hive, Spark, ...

Skills

Positions

Data Analysts

Data Scientist

Data Engineer

Portfolio Projects

Description

Created POC for predicting metrics based on the past values. Used Facebook prophet for forecasting implemented under python. The app forecasts the values of Clicks and Views based on the past trends and generates an alert when the value passes the specified threshold

• Created shell script for Hive Data

• Creation of DDL/DML scripts to process data using Hive

• Created python code with Facebook Proohet implementation

• Implemented partitioning and bucketing to optimize Hive queries.

• Support in case of issues/failure

Show More Show Less

Description

ProgrammaticX deals with Programatic data binding advertiser, deal and various real time factors to reporting system to make decision making a simple process.

As a part of various mappings, this projects aims at reading data from an external table. Role includes performing, processing and aggregation of data to supports business reports.

• Created shell script to Spark Job execution, data loading, alerting and logging

• Creation of DDL/DML scripts to process data using Spark on Hive Data Store

Created Spark UDF in Scala and codes written to process structured data
Spark used to transfer data in between two environments(Hive and Teradata)
Informatica Mapping to load source data into mapping tables

• Implemented partitioning and bucketing to optimize Hive-Spark queries.

• Implemented all the data processing rules as per business logic

Creation of Deployment documents
Design Documents
Code Reviews

• Monitoring the ETL job

• Support in case of issues/failure

Show More Show Less

Description

As high volume of data is involved, unnecessary data movement, temporary storage or staging of the data has to be avoided and it also demanded on the fly Hadoop Cluster creation and termination to save cost.

The EMR clusters were created on the fly during each ETL run and terminated after the successful completion of the processing.

• Created shell script to implement secure data transfer (using encryption)

• Creation of DDL/DML scripts to process data using Hive

• Created customized user-defined functions as per the requirements and deployed/implemented them on hive

• Implemented partitioning and bucketing to optimize Hive queries.

• Designed and Developed Talend Jobs to automate the flow of loading the data from source files to Cluster’s ephemeral storage. Also, implemented all the data processing rules as per business logic Creation of Deployment documents

• Monitoring the ETL job

• Support in case of issues/failure

Show More Show Less