Narendra mohan P.

Narendra mohan P.

Data engineer working with spark

Bengaluru , India

Experience: 14 Years

Narendra mohan

Bengaluru , India

Data engineer working with spark

76800 USD / Year

  • Start Date / Notice Period end date: 2020-12-01

14 Years

Now you can Instantly Chat with Narendra mohan!

About Me

I am working as data engineer with Happiest Minds, have good experience with pyspark, and also have full stack we development experience with Java technologies. 

...

Show More

Portfolio Projects

Description

Description of the Project

The project focuses on building a cloud solution to provide personalized customer experience, based on analytics, using data provided by retail customers. It enables customers to acquire, retain, and engage customers using the tools integrating with the client's ecosystem.

Responsibility

  • Design, develop and manage Data Pipelines in AWS cloud platform.
  • Worked with pyspark to do the wrangling and transformations to leverage distributed computing
  • Provide solution architecture to sync cloud Athena to PostgreSQL RDS
  • Enhance and Manage AWS Step function workflow jobs, to integrate services such asAWSLambda andAmazonECS into feature-rich applications
  • Design and develop serverless Data Pipeline to do ETL process, leveraging AWS Glue and Lambda
  • Worked on AWS Glue to schedule recurring ETL jobs, and chain multiple jobs together
  • Worked on AWS Lambda as it helps to extend otherAWSservices with custom logic that operate atAWSscale, performance, and security
  • Worked on data refinement so PII data should not be exposed to the different process of the pipeline.
  • Worked developing and streamline restful API to ingest data from third party
  • Handled csv, json format data ingestion
  • Worked on creating performance metrics and data validation metrics
  • Worked on enhancing data pipeline using AWS serverless services to achieve micro-service architecture.
  • Hands on with GIT as code base and version management
  • Troubleshooting functional issues and data discrepancies
  • Collaborate with stake holders for requirements and planning

Show More Show Less

Description

Description of the Project

The project was aimed at building analytical use cases for a retail client. The requirement was to build big data pipelines for hot & cold data path and further perform visualization on real time and batch data. The use cases involved evaluation of sales across geographies and stores for different timelines for better decision making.

Responsibility

  • Working as an individual contributor

  • Worked on creating new Data Pipeline using glue job to ingest data to Snowflake for feasibility test, existing pipelines were using redshift.

  • Worked on feasibility check whether Kinesis or Kafka could be used for the couchbase streaming, pipeline had some custom changes while pushing data, which were not feasible to do so in Kinesis. hence decided to use Kafka.

  • Worked on migrating existing Couchbase data pipeline from custom Java application to Kafka Custom java application was using Couchbase delta sync feature which license cost was higher, to save cost and to enable streaming capability to the ingestion used Kafka.

  • Worked with AWS Lambda to do simple transformations for the small json, used Firehose to buffer it for 5 min and store into bigger file.

  • Worked with AWS Glue to create jobs for the ETL transformations.

  • Worked with AWS Glue and pyspark for creating end to end data pipeline validation framework

  • Used Firehose to reliably load streaming data to buffer and load data to s3, needed 5 min buffered streaming for canvas data for analytics

  • Supporting the onshore dev team for any requirement

  • Co-ordinate with customer for requirement and planning

Show More Show Less

Description

Description of the Project

The objective of the project was to continuous ingestion of digital assets (images, pdfs, and other formats files), from MDM data source and consuming a SOAP API and get the content. Initially we used http server to get image data, which were causing http server load issue. And based on the need and keep performance of the http server, need to ingest images from http server to HDFS, so it will not impact http server, data was getting synced daily. We built a distributed ingestion application using capabilities of Hadoop and Spark/Beeline and store the digital asset in HDFS while also being available to access online through a WebHDFS URL.

Responsibility

  • Requirement gathering and design

  • Worked with HDFS command line tool to create a Data Pipeline for ingesting image related delta extract from source location to the staging.

  • Worked with pyspark to get image data, established pipeline to ingest image from the http server to local HDFS storage

  • Develop pre and post validation script for the ingested data

  • Create master table from staging data based on the pre-validation and status code.

  • Workflow automation using Oozie framework.

  • Unit testing

  • Mentoring team

Show More Show Less

Description

Description of the Project

Product Data Management is a complex function which is central to every retail business. Product Data (Attributes, Text, Images, Price) feeds into each business including customer, marketing, merchandising, pricing, supply chain, store operations and Digital. One of the key objectives of Product MDM business is ensuring Data Quality. For large retailers like Lowe’s which deals in Millions of SKUs, performing this activity manually is a big overhead and complex process. Poor Data Quality means bad customer experience & revenue loss, losing business to competition and legal implications for retailers (Some legal errors can lead to millions of dollars in compensation). As part of this project we are planning to build Data Quality Rule Engine along with Business front-end application which will automate the product data quality process (Error identification, Error Validation and Error Fixing) leveraging advanced machine learning and cognitive techniques including NLP, Text Analytics, Image Recognition, Deep Learning, AI, and other statistical techniques. The entire platform will be built using open source Hadoop Environment technology and Tools including custom algorithm development.

Responsibility

  • Requirement gathering and design

  • Create a data pipeline for ingesting data from multiple DBs to the HDFS. Used Sqoop to ingest data.

  • Worked on text-based data quality (Incorrect Promotional Bullets and Attributes are not according to the standards) using spark Scala

  • Worked images based (Primary Image contains incorrect text and incorrect swatch usage) using pyspark

  • Developed Algorithms using Scala, PySpark and Hive Scripts

  • Developed code for Image processing and Text processing

  • Responsible for data analysis and validating the metrics.

  • Unit testing

Show More Show Less

Description

Description of the Project

This project is considered as core Innovation Area for Lowe’s as a company in 2016 with high visibility! The aim of this project is to develop a custom Algorithm using Machine Learning (ML) & Data Science space using open source Bigdata eco-system (Hadoop stack) for Products price linking and classification.

Responsibility

  • Requirement gathering and design

  • Create a data pipeline for ingesting data from multiple DBs to the HDFS. Used Sqoop to ingest data.

  • Responsible for creating landscape of the project, design project workflows and identifying different tools and technologies to be used.

  • Responsible for Data Modelling, Transformation and Preparation for Product Price Linking and Classifier Engine.

  • Developed Hive Scripts and UDFs in Java, for data DDL and DML operations.

  • Responsible for data analysis and validating the metrics

  • Explore and implement text mining functions in Hive, Python and Java.

  • Unit testing

  • Responsible for Data Modelling, Transformation and Preparation for Product Price Linking and Classifier Engine

Show More Show Less