About Me
In my professional journey, I have experience building and maintaining data lakes. The tech stack involved: Scala, Spark, Python, Airflow, AWS, EMR, S3, ECS, and ECR. In my recent project I have experience in funnel analytics where I have built a...
Show MoreSkills
Development Tools
Web Development
Data & Analytics
Database
Programming Language
Others
Networking & Security
Software Engineering
Positions
Portfolio Projects
Company
Data Lake
Description
· Ingested data from disparate sources to create a data lake on S3.
· Setup Access control on AWS using SAML identity providers.
· Used Sqoop to capture data changes in Netezza.
· Optimized Netezza ingestion process to reduce overall time by 4 hours.
· Used AWS EMR task nodes to run spark tasks saving the cost by 10% of on-demand machines.
· Integrated Datadog with AWS services like ECS and EMR.
· Setup AWS EMR cluster to deploy Spark cluster.
· Optimized CI/CD pipeline to run the test in parallel in CircleCI.
· Anonymized PII data to handle CCPA requests.
· Implemented Airflow dependency management using Poetry.
· Setup Lambda process that gets triggered via AWS-SES ruleset.
· Automated data governance capability for the ETL jobs.
· Implemented Airflow root DAG to track the status of all the DAG and send the report over mail.
· Setup process that calculates domain recency metrics and sends over the mail on a daily cadence.
Implemented data validation functionality that checks the schema of incoming JSON events before the transformation.
Show More Show LessCompany
Enterprise Data lake - Funnel Analytics
Description
Setup Databricks cluster to run Spark transformation to create Power BI reports.
Show More Show LessSkills
Power BI Apache Spark