About Me
Data Engineer with over 5 years of extensive hands on experience designing cloud based Data solutions and related fields.
-> I have expertise in following.
Programming Languages - Python, SQL
Big Data - spark, Pyspark, spark SQ...
Programming Languages - Python, SQL
Big Data - spark, Pyspark, spark SQL, Hadoop
Azure - Azure Databricks, Azure Data Factory, Azure Data Lake, Azure SQL Database, Azure synapse
AWS - AWS S3, AWS EMR, AWS Redshift
Scheduling Tools - Airflow
-> Certification:
Microsoft Certified Azure Data Engineer,
Microsoft Azure Data Fundamentals,
AWS Certified Cloud Practitioner
-> Below are few of my responsibilities.
Designed and Developed end-to-end Enterprise Data Analytics Solution on Azure native services like Azure Data Factory, Azure Databrick (pyspark, sparkSQL) and Azure synapse Analtyics.
Developed a API for data cleaning, metadata validation on large scale data using python and Spark(Pyspark) on Azure Databricks, which helped in reducing ELT pipeline computation time by a factor of 10.
Build and maintain scalable ETL pipelines on AWS Glue, to extract data from multiple legacy source systems( SQL, NoSQL), loading into AWS S3, AWS dynamoDB and analyze data using interactive queries on AWS Athena.