Now you can Instantly Chat with Agam!
In my professional journey, I have experience building and maintaining data lakes. The tech stack involved: Scala, Spark, Python, Airflow, AWS, EMR, S3, ECS, and ECR. In my recent project I have experience in funnel analytics where I have built a...Show More
Data & Analytics
Networking & Security
· Ingested data from disparate sources to create a data lake on S3.
· Setup Access control on AWS using SAML identity providers.
· Used Sqoop to capture data changes in Netezza.
· Optimized Netezza ingestion process to reduce overall time by 4 hours.
· Used AWS EMR task nodes to run spark tasks saving the cost by 10% of on-demand machines.
· Integrated Datadog with AWS services like ECS and EMR.
· Setup AWS EMR cluster to deploy Spark cluster.
· Optimized CI/CD pipeline to run the test in parallel in CircleCI.
· Anonymized PII data to handle CCPA requests.
· Implemented Airflow dependency management using Poetry.
· Setup Lambda process that gets triggered via AWS-SES ruleset.
· Automated data governance capability for the ETL jobs.
· Implemented Airflow root DAG to track the status of all the DAG and send the report over mail.
· Setup process that calculates domain recency metrics and sends over the mail on a daily cadence.
Implemented data validation functionality that checks the schema of incoming JSON events before the transformation.Show More Show Less
Enterprise Data lake - Funnel Analytics
Cloud Data Lake
(2022). Data Lake implemented on AWS S3 containing web clickstream data. Databricks used as a tool for spark transformation. Data exposed was used to create Power BI reports. Databricks, Spark, Power BIShow More Show Less