About Me
Big Data Engineer having 6+ years of experience in IT industry and Proficient in Python, Scala, Big Data, Hadoop, Hive, Map Reduce, Spark RDD, Spark SQL, Shell Script, Airflow, Snowsql, AWS EC2, S3, Glue, EMR, Microsoft Azure Blob, VM, Data Factor...
Show MoreSkills
Database
Programming Language
Software Engineering
Development Tools
Data & Analytics
Others
Web Development
Operating System
Portfolio Projects
Company
Data ingestion and transformation from different sources using Big Data for US Retail Client
Role
Backend Developer
Description
- Created Big Data Pipeline and Implemented Spark based Model for batch processing of data.
- End to end product features development.
- Ingesting data from sftp server to snowflake via aws s3 using JDBC connection
- Ingesting data from sftp server to postgress via snowflake using REST API
- Trained interns for Big data Technologies.
- Developed pyspark code to validate data and load data to snowflake from s3 Ingesting data from Azure blob and loading the data to Postgress . Schedule the entire flow using Apache Airflow
Skills
Apache Airflow AWS AWS EMR AWS Glue AWS-EC2 Azure Blob PostgreSQL PySpark Shell Scripting SnowflakeTools
GithubCompany
Data Ingestion And Transformation From Different Sources Using Big Data in Banking Domain
Description
- Ingest data from sql database to hive with business transformation.
- Script creation to generate reports for BI team from Hive using HIVEQL.
- Contributed in enhancements and performance improvement of the process.
- Involved in performance improvement activity in Hive with Joins,Group and aggregation
- Created automation process to schedule all scripts using Oozie/Falcon.
Tools
PyCharmCompany
PNDA-PlatformforNetworkDataAnalytics
Description
PNDA is an Open source Platform for Network Data Analytics.Efficiently distributes data with publish and subscribe model.Processes bulk data in batches,or streaming data in real-time.Worked as a coredeveloper of PNDA. Have added new features,fixed manyi ssues.Developed Kafka custom producers, developed some batch and stream spark applications with python for PNDA
Show More Show Less
Skills
Apache-Kafka PySpark Python Spark StreamingTools
PyCharm