About Me
Overall 8.3+ Years of IT experience as a Data Engineer Big Data, Hadoop, and Spark Technologies. Hands-on Experience in Hadoop Technologies like HDFS, Hive, Sqoop, Apache Spark, Spark SQL, Spark Streaming, Apache Kafka, Apache Nifi, Splunk/Scribe, Ge...
Show MoreSkills
Portfolio Projects
Description
Technology:Hadoop, Hive, Kafka, apache Nifi, HDP.
Description:the project which will deliver a pilot model for a centralised data storage and
data management platform. This will be used to store and process the data from the log files
that come from various IT and Network applications into ArcSight.
Role &Responsibilities:
- Build the nifi job for Kafka topic to hdfs (raw zone)
- Build the stream flow for the serving layer(stream data).
- Develop DDL for store the data into hive table
- Implemented performance tuning.
Description
Technology:Hadoop, Sqoop, Hive, maven, shell script, Spark, Spark Streaming, Scala, Kafka.
Description: This Project is responsible for Developing Data lake to analyse customer data of a leading bank which allows client to have clear picture of customers and their activities. Project has two phases as Ingestion and Data Manipulation. First phase of this project is to develop a Ingest mechanism to pull data from source systems like DB2 and CDC to HDFS platform.
Role &Responsibilities:
- Develop automation for Sqoop job for multiple table with provided mapper.
- Develop spark code for moving data from staging layer to schema, raw and archive layer with add additional audit column
- Develop spark code for collecting data from Kafka topic to staging layer(HDFS).
- Execute all script using batch scripting.
Description
Technology:Hadoop, S3, EMR, EC2,AWS, Spark core and python.
Description: Adtech ingestion is a ingestion the data from multiple source location to data lake and the date lake is on S3 premises for storage. The data manipulation and cleansing perform using spark and python. Validate the data in data warehouse using Hive and redshift.
Role &Responsibilities:
- Involved in manage data coming from different DB to S3 using spark python.
- Writing CLI commands using HDFS and S3.
- Involved in creating Hive tables, loading with data which will run internally in Map Reduce way.
- Implemented complex Hive and red shift queries for validate data.
Description
Engaging with potential clients to understand their business requirements and challenges related to Big Data and analytics.. Collaborating with the sales team to develop technical solutions and proposals that address the clients specific needs using Big Data technologies, Hadoop, and Azure Cloud services. Creating high-level architecture designs and diagrams that illustrate the proposed solutions and their integration with the clients existing systems. Participating in workshops, conferences, and industry events to promote the organizations expertise in Big Data and Azure Cloud services. Involvement in Splunk, Geneos alert mechanism configuration. Pyspark job design as per the client requirement and problem solution provided to the team. Tracking team activity and helping the team to resolve their issue.
Show More Show LessDescription
Designing and deploying dynamically scalable, available, fault-tolerant, and reliable applications on the Cloud. Selecting appropriate Cloud services to design and deploy an application based on given requirements. HS search real time data ingestion pipeline using PUB/SUB in GCP POC. Designing and deploying enterprise-wide scalable operations on Cloud Platforms. Experience in Cloud migration technologies including Azure Migrate & Cloudamize Experience in defining Cloud migration roadmaps (Timeline, sequencing, priorities etc.)
Show More Show LessDescription
Develop automation for Sqoop job for multiple table with provided mapper. Develop spark code for moving data from staging layer to schema, raw and archive layer with add additional audit column Develop spark code for collecting data from Kafka topic to staging layer(HDFS). Execute all script using batch scripting.
Show More Show LessDescription
support in managing data coming from Relational DB to HDFS using Talend. Writing CLI commands using HDFS. Implemented complex Hive queries with partitioning. Analysis, design, implement and maintain High value of data using Talend big data platform. Involved in analyzing data in Hive warehouse using Hive Query Language (HQL) send to reporting tools.
Show More Show Less