Now you can Instantly Chat with Deepam!
About Me
8+ years of total experience in designing and developing Big Data and Python applications. 6+ years of experience developing, deploying, and supporting high-quality, fault-tolerant data pipelines (with various distributed, data movement technologies ...
Show MoreSkills
Positions
Portfolio Projects
Description
Project Details: -
True Influence Marketing Cloud - This Cloud platform collects and catalogues decision-makers content consumption across the entire internet and helps our customers target decision-makers who are actively researching a purchase decision.
Responsibilities:
• Developing and debugging data pipelines to ingest unstructured and semi-structured data from various sources, clean, transform and finally load data into Clickhouse Database for analytical purposes.
• Learned Clickhouse data warehouse in 1 week and performed query optimizations which reduced the query response times from 400+ sec to 10 sec.
• Processing terabytes of data in daily and weekly processes efficiently at 40% less cloud costs.
• Implemented pipelines using Google BigQuery, cloud storage, composer, AWS EMR, Athena, S3, Glue, Redshift, etc in various stages of the Pipelines.
• orchestrated the pipelines using Airflow.
• Running massive ephemeral clusters using Spot nodes at up to 70% less cost than on-demand clusters.
Description
Project Details: -
1. American Express – MatrixKRI (Key Risk Indicators) - This Platform helps identify and manage existing and emerging risks that stem from business activities and ensure these risks are effectively identified and escalated to be measured, monitored, and controlled.
Responsibilities:
• Developed optimized PySpark and Hive jobs to identify and report the business and legal risks proactively to save millions of dollars in operational and legal costs.
• Manage data collection to support daily, weekly, monthly, etc. reports and interactive dashboards.
• Data storage, compute analysis for performance optimizations of hive/spark jobs.
Description
United Airlines – UDH (United Data Hub) - UDH is a data lake solution to bring various data silos together and design an open data platform for the organization.
Responsibilities:
• Migrated existing Data warehouse to cloud to achieve zero wait time for ETL jobs using ephemeral clusters with spot nodes.
• Developing PySpark & java spark transformations on Palantir Foundry platform and creating pipelines/workflows for data ingestion, extraction, data modelling, and processing of structured and unstructured data so that business can make data-driven informed decisions.
• Proactively automate monitoring, alerting based on workflow needs.
Description
Responsibilities:
- Analyzed billing and prepared a cost estimation for GCP, Azure, and Kyvos.
- Setup data warehouse: Azure Synapse and BigQuery.
- Setup and launch GCP VM and Azure Synapse Data warehouse.
- Setup test configurations and JMeter over these instances.
- Executed concurrency tests on the tables via JDBC connection using JMeter.
- Prepared benchmark comparison and cost comparison report.