Ankit A.

Ankit A.

BigData Engineer

Bengaluru , India

Experience: 5 Years

Ankit

Bengaluru , India

BigData Engineer

21606.4 USD / Year

  • Notice Period: Days

5 Years

Now you can Instantly Chat with Ankit!

About Me

Hadoop Developer with 3+ years of experience installing, configuring and leveraging the Hadoop ecosystem to glean meaningful insights from semi-structured and unstructured data and currently living in Bangalore.

  • 3.6 years experience...

    Show More

Portfolio Projects

Description

Role : Informatica Developer

Details :

HMC is the only healthcare organization outside the United States to receive simultaneous Joint Commission International (JCI) re-accreditation for all its hospitals and in 2011 the ambulance service and home healthcare service also received JCI accreditation. The Purpose of this project is to maintain a data warehouse and datamart layer that would enable the home office to take corporate decisions.

Roles and Responsibilities :

· Analyzed the business requirements and functional specifications.

· Extracted data from oracle database and spreadsheets and staged into a single place and applied business logic to load them in the central oracle database.

· Used Informatica Power Center 10.1.1 for extraction, transformation and load (ETL) of data in the data warehouse.

· Extensively used Transformations like Router, Aggregator, Joiner, Expression and Lookup, Update strategy and Sequence generator and Stored Procedure.

· Developed complex mappings in Informatica to load the data from various sources.Implemented performance tuning logic on targets, sources, mappings, sessions to provide maximum efficiency and performance.

· Parameterized the mappings and increased the re-usability.

· Used Informatica Power Center Workflow manager to create sessions, workflows and batches to run with the logic embedded in the mappings.

· Used the PL/SQL procedures for Informatica mappings for process control in incremental load.

· Created the ETL exception reports and validation reports after the data is loaded into the warehouse database.

· Written documentation to describe program development, logic, coding, testing, changes and corrections.

· Followed Informatica recommendations, methodologies and best practices.

Show More Show Less

Description

Details:

The purpose of this project was to help BI teams, Stakeholders from different Business Units understand or find probable causes of defective units by analyzing vast data from various sources like manufacturing locations, shipping companies used, Warehouses and Resellers. Vast data was moved from RDMS to HDFS with schema, using SQOOP and also live data from resellers for defective returns using Flume.

The data stored in HDFS was stored in HIVE tables and queried using HQL.

The project has three important Big Data Components:

  • Moving Data from Traditional RDMS to HDFS (SQOOP)
  • Storing Live data from resellers to HDFS (Kafka with Spark streaming)
  • Querying the data from HDFS using HIVE to provide useful analytical information for the decision makers. (HIVE finally connected to QlikView)

Roles and Responsibilities:

  • Used SQOOP to import vast data from traditional RDBMS to HDFS.
  • Involved in writing import query for the incremental data on a scheduled basis.
  • Used Oozie to automate the Sqoop jobs.
  • Worked on Hive queries for creating and querying HIVE tables to retrieve useful analytical information.
  • Knows about the full structure of capturing live data from resellers.
  • Monitoring of Data Pipelines to ensure the thorough transfer of data.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Written documentation to describe program development, logic, coding, testing, changes and corrections.

Show More Show Less

Description

The purpose of this project was to consolidate the daily transactions into a hive partitioned table based on max booking date, so as to help stakeholders to get the data for a business needs. The data stored in HDFS was stored in HIVE partitioned tables in ORC format

 Fetching Data from mainframes to daily hive tables using bash script(having sqoop commands and other hive commands to create a schema and other necessary details)

 Then run pyspark model of Consolidated flow on airflow to consolidate data.

 This was scheduled on airflow on daily basis and load the other control table to have proper statistics with proper data load.

Show More Show Less