Abhishek K.

Lead Big Data Engineer

Bengaluru , India

Experience: 11 Years

Abhishek

Bengaluru , India

Lead Big Data Engineer

USD / Year

Start Date / Notice Period end date:

11 Years

Now you can Instantly Chat with Abhishek!

Chat Now

About Me

Lead Bigdata Engineer having 11+ years of IT experience in developing application using Bigdata, Spark, Kafka, Scala, Java, Python, Hive, Airflow, Sqoop, Shell scripting, Talend, NiFi, Autosys, Snowflake etc. Microsoft Certified: Azure Data Engineer ...

Skills

Positions

Data Analysts

Data Scientist

Business Analysts

Data Engineer

Portfolio Projects

Description

I am leading thisproject which is intended to migrate the on-premises Hadoop / Spark clusters to AWS cloud environment. It involves migration of Terabytes of data along with the metadata and ongoing scheduled jobs in Autosys and the new Big Data workflow in EMR. Alogwith the migration parallely new development is also being performed by our team using Airflow.Data migration is being performed by a tool WANdisco.

My role and responsibilities :

· Working on migration of huge data by creating replication rules in WANdico.

· Creating shell scripts for Hive metadata migration.

· Developed shell scripting for manual data migration from hdfs to s3 in case WANdisco is facing downtime.

· Doing code remediation in which Spark, Hive, Sqoop etc. jobs are migrated from on premises to AWS EMR and then they are run there. This is the most challenging step which requires lot of code changes and troubleshooting skills.

· Created Airflow DAGs for historical loads for different sources of data using Pyspark.

· Developed Airflow DAGs for incremental loads by using Hive and implemented SCD – Type2.

· Created Pyspark dataframes and performed joins and complex aggregations on them, and finally stored them in hive tables.

· Modified existing HQL scripts, DAGs and Pyspark code to fix the bugs.

· Developed DAGs to migrate data from Hive to Snowflake.

· Fixed lot of bugs which arise due to version mismatches in on-premises Hadoop clusters and EMR and also due to different file formats.

· Helping other team members in resolving their issues and bugs.

· Managing 8 members team and reolve the technical/functional issues encountered by other team members..

Show More Show Less

Description

Client was a well known Payment gateway based out of US and this project was about India Data Localization rule in which all the transactions done in India must be localized to Indian Data centers. So Big Data Platform was developed and the flows and scheduled jobs along with the application data was moved to India data centers and also the applications were moved from other tools like Ab Initio to Spark and Hive to facilitate huge data processing in an optimum and faster way.

My role was of Senior Data Engineer & responsibilities were as follows:

· Interacted with the client and Business Analysts, involved in meetings for gathering requirements for the flows to be migrated.

· Worked on the migration of data from existing systems i.e. Ab-Initio to Hortonworks environment Hive by using Spark. Understood the Ab Initio transformation and created equivalent Spark functions as part of the application migration to Big Data.

· Developed Spark applications for implementing business rules and processing of data of data using Scala.

· Developed the Spark Scala code by using functional aspects, for migrating business logic from Ab Initio to Spark and Hive.

· Created Unix shell scripts to integrate business flow logic with Spark applications.

· Developed unit test cases for each function by using Funsuite and BeforeAndAfterAll framework.

· Created Confluenece documentation for Scrum demonstrations.

Show More Show Less

Description

Client wasan American multinational manufacturer and marketer of prestige skincare, makeup, fragrance and hair care products. Project was all about to create a Big Data platform to facilitate the processing of the huge volume of the data and then to load in the processing layer for reporting purpose.

My roles and responsibilities:

· Developed Data Integration scripts using Azure Data Factory to ingest data from BLOB storage to ADLS and then to process the data in Databricks.

· Developed Data Transformation code using Azure HDInsight, Spark and Azure Databricks.

· Developed SQL Stored Procedure scripts to transform and load data in SQL Server Database.

· Developed Polybase scripts to perform bulk load of transformed data to Azure Data Warehouse.

· Developed unit test cases for each function by using Funsuite and BeforeAndAfterAll framework.

· Reviewed code and did documentation for support team for the modules, which we moved to production

Show More Show Less

Description

Project was about to implement Big Data solution in GCP.

My Roles & Responsibilities:

· Involved in customer calls for requirements gathering and for scrum meetings.

· Created a target dataset in BigQuery for each source system and loaded the history data using the appropriate component in GCP for all the entities identified as part of data modeling.

· Created tables in various layers/datasets (Stage layer, Conformed layer) in BigQuery and loaded data from landing/staging layer, based on the given Functional Mapping Document. It involved transformations.

· Created spark dataframes and performed various joins, complex aggregations and other transformations, as per the business logic.

· Reviewed code and did documentation for support team for the modules, which we moved to production

Show More Show Less

Description

Client wasan American multinational technology company headquartered in California that designs, develops, and sells consumer electronics (one of the largest companies), computer software, and online services.

Project was all about for log aggregation and analytic engine to process real-time logs coming from Proxy servers, NMS-Routers/switches, Juniper networks, Load Balancer etc. The application provides near real time monitoring and reporting of logging events to operation teams. Security incidents and network health statistics extracted from log events in real time.

My Responsibilities:

• Developed Spark Streaming data pipelines to read/process messages coming from Kafka message Bus.

• Developed Spark Batch Jobs to ingest data into Hive Layer.

• Used Spark SQL DataFrame API and aggregate functions extensively for transforming the raw data to meaningful data for visualization.

• Designed and implemented Kafka Producer Application to produce messages using Kafka Connect Framework.

• Implemented Schema Registry to define the schema of the Kafka messages.

• Used AvroConverter for serialization and deserialization for the Kafka data.

• Used Kafka JDBC connector for populating the Kafka topic from Oracle DB table as source for transactional data.

• Used Kafka HDFS connector to export data from Kafka topics to HDFS files and integrates with Hive to make data immediately available for querying with HiveQL.

• Used Kafka Datagen connector for development purpose for generating the test data.

• Did POCs for exploring KSQL in the project.

Show More Show Less