Dinesh P.

Dinesh P.

Kafka, Spark and GCP developer

Pune , India

Experience: 6 Years

Dinesh

Pune , India

Kafka, Spark and GCP developer

56051.5 USD / Year

  • Notice Period: Days

6 Years

Now you can Instantly Chat with Dinesh!

About Me

  • Overall 7.2+ Years of IT experience as a Data Engineer Big Data, Hadoop, and Spark Technologies.
  • Hands-on Experience in Hadoop Technologies like HDFS, Hive, Sqoop, Apache Spark, Spark SQL,...
  • Experience in Architecting and designing solutions leveraging services like Cloud Big query, Cloud Pub OR Sub, Azure Databricks, Data lake.
  • Involved in capability building for developing real-time streaming applications using Apache Spark.
  • Performance tuning of Spark use file format like (Avro, Orc, and Parquet).
  • Hands on Experience in Data Modelling in Hive.
  • Experience in CDH, CDP7, and Horton works environment.
  • Experience in building the ApacheNifi job.
  • Experience in Azure Blob Storage, Datarbicks, Cloud Engineering with GCP.
  • Strong Knowledge of Agile (Scrum) methodology.
  • Domain Experie0nce in Pharmaceutical, Retails, Tele Communication, Banking.
  • Involved in release, Go Live activities.
  • Willing to update my knowledge and learn new skills according to Business Requirements.

Show More

Portfolio Projects

Description

Technology:Hadoop, Hive, Kafka, apache Nifi, HDP.

Description:the project which will deliver a pilot model for a centralised data storage and

data management platform. This will be used to store and process the data from the log files

that come from various IT and Network applications into ArcSight.

Role &Responsibilities:

  1. Build the nifi job for Kafka topic to hdfs (raw zone)
  2. Build the stream flow for the serving layer(stream data).
  3. Develop DDL for store the data into hive table
  4. Implemented performance tuning.

Show More Show Less

Description

Technology:Hadoop, Sqoop, Hive, maven, shell script, Spark, Spark Streaming, Scala, Kafka.

Description: This Project is responsible for Developing Data lake to analyse customer data of a leading bank which allows client to have clear picture of customers and their activities. Project has two phases as Ingestion and Data Manipulation. First phase of this project is to develop a Ingest mechanism to pull data from source systems like DB2 and CDC to HDFS platform.

Role &Responsibilities:

  1. Develop automation for Sqoop job for multiple table with provided mapper.
  2. Develop spark code for moving data from staging layer to schema, raw and archive layer with add additional audit column
  3. Develop spark code for collecting data from Kafka topic to staging layer(HDFS).
  4. Execute all script using batch scripting.

Show More Show Less

Description

Technology:Hadoop, S3, EMR, EC2,AWS, Spark core and python.

Description: Adtech ingestion is a ingestion the data from multiple source location to data lake and the date lake is on S3 premises for storage. The data manipulation and cleansing perform using spark and python. Validate the data in data warehouse using Hive and redshift.

Role &Responsibilities:

  1. Involved in manage data coming from different DB to S3 using spark python.
  2. Writing CLI commands using HDFS and S3.
  3. Involved in creating Hive tables, loading with data which will run internally in Map Reduce way.
  4. Implemented complex Hive and red shift queries for validate data.

Show More Show Less