Now you can Instantly Chat with Akram!
About Me
- 7+ years of application development experience including 3+ years of experience in big data development along with 4+ years of experience in data Engineering, data warehousing and business Intelligence.
- Experience building systems...
- Experience building systems to perform real-time data processing using spark streaming, Kafka, spark sql and cloudera.
- Worked extensively with dimensional modeling, data migration, data cleansing, data profiling, and ETL processes features for data lake and data warehouse.
- Design and build ETL pipelines to automate ingestion of structured and unstructured data in batch and real time mode using Kafka, spark sql, spark streaming, hive, Impala and different ETL tools.
- Hands on experience in Importing and exporting data from different databases like SQL, Oracle, Teradata into HDFS using Sqoop
- Worked with multiple ETL tools like Informatica Big Data Edition 10.2.2., Alteryx and Kalido.
- Experience in Informatica BDM (Informatica 10.1.1 HotFix) a tool for data ingestion and integration on Hadoop.
Skills
Portfolio Projects
Description
Customer Experience Management (CEM):
The business wish to understand the sentiment onsocial media (Facebook, Twitter, Instagram) and the interactions, usage and habits of its customers.
Responsibility:
• Extracted the data from different sources and store at HDFS.
• Build real time streaming pipeline using Kafka and spark streaming.
• Develop a ETL logic to aggregate the data on hourly and daily basis .
• Develop ETL mapping using informatica BDM to run on Hadoop cluster.
• Support provided in machine learning model deployment.
Tool Sets: Informatica BDM, Spark Streaming, Kafka, Cloudera, Spark, Spark SQL,Hive
Description
Project Description:
Customer Churn Model:
The aim of this project is to build a machine learning model that accurately identifies customers which have potential to churn in the subsequent year, for taking appropriate measures to avoid their churn.
Responsibility:
• Extracted the data from different sources and store at HDFS.
• Loaded all the required data into hive tables.
• Develop a Alteryx workflow to create analytical dataset which used as a input data source for machine learning model.
• Provide exploratory data analysis and feature engineering
• Support provided in model building & validation
• Support provided in model deployment.
Tool Sets:
Maching Learning, Python, Alteryx Designer, Alteryx Gallery, MapR ,
Spark, Hive.
Description
Project Description:
Adobe Analytics:
MHHE.com is the website used for purchasing MHE products online. MHE used Adobe products (Abode Analytics, Adobe Target and Abode Experience Management) for tracking online activities in MHHE.com. Adobe gives a feature to export data (Data Feed) to FTP. So we need to extract all the users activities data from ftp site to Hadoop and Identify how online activity is affecting sales.
Responsibility:
• Develop the shell script to extract all the historical data from ftp site to MapR environment.
• Develop Alteryx workflow for data transformation, data lookup and data processing.
• Develop Alteryx workflow for writing the transformed data into the HDFS
• Create shell script to load the data from hive staging table to final hive ORC reporting table.
• Created the final scripts to schedule the workflow into production.
• Setup the daily jobs to load the data from ftp site to hive reporting table.
• Data related issue in daily jobs.
Tool Sets:
MapR, MapR-FS, Alteryx Designer, Alteryx Gallery , Hive , Shell Script , Tableau , AppWorx
Description
Project Description:
Customer Complaint Analysis (Consumer Analysis)
McGraw Hill Education is a learning science company and customer complaint regarding the digital product received by the JIRA.
Customer complaint Analysis is Handling Customer Dissatisfaction. This may be a critical Issue for the Customer. In this project we Investigate the current sources of customer complaints and what are the causes of complaints and want to seek the effective ways of handling customer complaints by examining different type of products and Issues. For achieving customer complaint analysis, we are using Big data platform for storing of huge amount of data.
Responsibilities:
• Developed a java application to hit the JIRA API to get the complaint data.
• Developed a java application to parse the JIRA API response to CSV file from JSON format.
• Created shell script to move the data from local system to hdfs and performed data transformation using the hql.
• Moved the data from hive staging table to final ORC reporting table.
• Setup daily job to extract the data from JIRA API and moved into hive reporting layer.
Tool Sets:
Java, Eclipse, Hive, Shell Script, AppWorx, MapR
Description
Project Description:
Migration of salesforce data to Hadoop ecosystem
The Key Account Manager is responsible for managing key accounts, maintaining a long-term relationship with accounts and maximizing sales opportunities with them.
CRM application captured information about the different activities performed by sales rep and KAM needed reports about the sale rep activities on different dimension.
Responsibilities:
• Extract the metadata of salesforce object through the shell script and create the hive table dynamically.
• Data replication of objects to informatica server through informatica Cloud.
• Build data pipe line from informatica server to hive presentation layer.
• Perform quality and duplication checks by HQL
• Implement SCD Type 2 by HQL and shell script.
• Monitoring of daily jobs run.
• Fixing the data related issue.
• Create unit test cases and performed unit testing.
Tools Set:
Hive, Shell Script, HDP, Tableau ,Informatica developer, Ozie, Informatica Cloud
Description
KAM Analytics KAM Analytics implementation is made to support KAM with Account planning, prioritization and product launches. Business users like to view data through detailed reporting on Business Objects application to better understand and analyze information of KAM, customers, buying group, product and sales in cash, quantity and volume. For SAP BO reporting purpose, they need a data warehouse to get the sales information on different dimensions (i.e. Customer, Product, Geography etc.)
Responsibilities: • Working with business users and business analyst for requirements gathering and business analysis. • Converted business requirement into high level and low-level design. • Designing and customizing data models for Data warehouse supporting data from multiple sources on real time. • Extracted data from different flat files, MS Excel, DS and transformed the data based on user requirement using Informatica Power Center and Kalido ETL Tools and loaded data into target by scheduling the sessions • Supporting daily loads and work with business users to handle rejected data. • Developed reusable Mapplets and Transformations. • Reviewed high-level design specification, ETL coding and mapping standards. • Performed Unit Testing and tuned for better performance. • Created various Documents such as Source-to-Target Data mapping Document, and Unit Testing Document, System Testing Document.
Show More Show LessDescription
CRS (Central Repository System) As offshore team we had to implement multiple enhancements in the product. Enhancements were primarily around various new interfaces need to be added.
Responsibilities: • Involved in requirement gathering and performed ETL Process • Created dimension tables and fact tables based on the warehouse design. • Developed and updated documentation of processes and system. • Done analysis of source, requirements, existing OLTP system and identification of required dimensions and facts from the database.
Show More Show Less