Rajat J.

Data Engineer with Spark, Azure

Bengaluru , India

Experience: 6 Years

Rajat

Bengaluru , India

Data Engineer with Spark, Azure

54179.6 USD / Year

Immediate: Available

6 Years

Now you can Instantly Chat with Rajat!

Chat Now

About Me

===I have solutions to all of your Data Related problems here====

My core expertise are writing highly scalable ELT Jobs, with Python, Spark, Hadoop

Below are my detailed skillset - so that we can quickly get started and...k, Hadoop

Below are my detailed skillset - so that we can quickly get started and solve your problem.

===Data Engineering===

• Creating a Data Pipeline on Cloud Platforms like Amazon Web Services (AWS).

• Writing Extract-Load-Transform (ELT) jobs for data processing using technologies like (Hive,Pyspark)

• Building real-time data pipelines for Streaming data using Apache Kafka

========= Data Engineering Skills =========

Expertise:

Amazon Web Services:

Tools & Libraries:

PySpark, Spark, Scala, Python, Hadoop, Hive, SparkML, AirFlow.

Database:

Postgres, MySQL, Oracle, DynamoDB, MongoDB, MSSQL

===Image Processing===

• Tesseract

• AbbyyFinereader

===Tools & Libraries===

Jupyter Notebook,PyCharm,SQL

Skills

Positions

Portfolio Projects

Description

Extracting the data from scanned images from pdf using Python Tesseract OCR and Abbyy OCR.
Training the OCR to read the character in proper format and saving the file to apply to the other complex files.
Developed a Self Service Portal based on Python Flask Framework which displays the extracted data based on the dynamic query.
Supervised Job scheduling via Oozie and managed data ingestion

Show More Show Less

Description

1) For Address data of the United States, I have provided data engineering solutions including#Tranformation#Validation#Formattingusing#Spark3#Python3#PySpark#AWSand other Python library of validating the addresses.

2)For election data of the United States, I have provided data engineering solutions including#Tranformation#Validation#Formattingusing#Spark3#Python3#PySpark#AWSand have validated the US voters data against 240 Million records from the above solution.

3) Completed the#LogicAppfunctionality course and implemented an automated workflow for loading data from different sources to#Microsoft#Azure#SQL#Database.

4) Completed the Azure data factory fundamental course and implemented the automated triggered workflow for loading data from different sources like#salesforce#Sqldatabase#Sharepoint#List.

Show More Show Less

Description

Worked on exposing the Restful Services using the bottle framework and python.
Written scripts in python for storing and fetching the metadata received from clients into Mongo DB
Used PY-UNIT testing for testing the python scripts internally
Deploying to Production and Development Environment Workflow.

Show More Show Less