Now you can Instantly Chat with Sumit!
About Me
Results-driven Data Engineer with 8+ years of experience in Software Developing, Debugging, and Process Improvement. Skilled in Python, PySpark, MySQL, Hadoop, Hive, Sqoop, and AWS Services like Glue, Redshift, S3, Athena, BOTO3, EC2, SES, IAM User, ...
Show MoreSkills
Positions
Portfolio Projects
Description
Responsibilities:
-
Read and analyze data using Pyspark.
-
File Handling using PySpark and creating dataframes from file.
-
Working with SparkSQL and write interactive queries in dataframes .
-
Performed transformations and actions on Spark RDDs using PySpark.
-
Exposure in Map/Reduce functions through PySpark.
-
Developing and maintaining PL/SQL Functions/Procedures.
-
Using PL/SQL analytical functions for data analysis and creating reports.
-
Involved in requirement gathering, understanding current system.
-
Responsible to make a data validation tool to validate tables from sas and hive.
-
Using operations of pyspark and fixing the code according to condition.
-
Converting sas codes into pyspark codes.
-
Data viualisation and building and querying tables using AWS Athena and maintaining Athena objects in staging and mart,detail etc.
-
Creating tables in hive,Using athena and aws services.
-
Analyzing data to identify issue of data error.
Description
Responsibilities:
-
Read and analyze data using pandas ,BOTO3 and s3.
-
File Handling using Pyandas and creating dataframes from file.
-
Working with SQL and write interactive queries in dataframes .
-
Developing and maintaining PL/SQL Functions/Procedures.
-
Using PL/SQL analytical functions for data analysis and creating reports.
-
Involved in requirement gathering, understanding current system.
-
Responsible to make data pipelines to validate tables from s3.
-
Using operations of python and aws services and fixing the code according to condition.
-
Data viualisation and building and querying tables using AWS redshift and maintaining redshift objects in and mart,detail etc.
-
Creating tables in aws redshift.
-
Analyzing data to identify issue of data error.
Description
Responsibilities:
-
Read and analyze data using Pyspark in AWS Glue job.
-
Data extraction,Transformation and load using pyspark and sparksql in AWS Glue.
-
Handling and managing large dataframes.
-
Working with SparkSQL and write interactive queries in dataframes .
-
Performed transformations and actions on Spark RDDs using PySpark.
-
Exposure in Map/Reduce functions through PySpark.
-
Using Python Pandas library to handle file validations and transformations in python windows environment.
-
Data viualisation and building and querying tables using AWS Athena and maintaining Athena objects in staging and mart.