Tapan B.

Expert data architect with hadoop and spark tech stack on cloudera and AWS platforms

Hyderabad , India

Experience: 15 Years

Tapan

Hyderabad , India

Expert data architect with hadoop and spark tech stack on cloudera and AWS platforms

100000 USD / Year

Immediate: Available

15 Years

Now you can Instantly Chat with Tapan!

Chat Now

About Me

15 years of experience in conceptualization, design, effort estimation, development, architecture and maintenance of software applications, products, enterprise data warehouses (EDWs) and data lakes. 6 years of experience in developing distributed...

KEY ACHIEVEMENTS OF LAST 5 YEARS:

•   helped one of the largest global bank build it's firmwide data lake and build high quality curated datasets on top of it.
•   created spark based analytics solution on AWS cloud within 6 months for a digital media analytics firm.
•   tuned performance of P2P processes for for world's largest RDBMS firm.

CAREER HIGHLIGHTS:

•   Proven track record of delivering Hadoop/Spark and batch/streaming solutions within quick timeframes of 3 to 6 months from kick off to go-live.
•   6 years of Big data, Hadoop and Spark experience on EMR and Cloudera environment
•   2 years of real time stream processing experience using Spark streaming
•   1 year of cloud computing experience using AWS services including EMR, EC2, Kinesis, S3, DynamoDB, IAM, and CloudFormation
•   8 years of RDBMS experience (Oracle 9i, 10g, 11g and SQL Server 2005, 2008, MySql and MariaDB) and 2 years of MPP experience (Teradata)
•   Progamming language agnostic. 6 years of Java, 4 years of scala and 2 years of python programming experience
•   Full CI/CD lifecycle exposure to GIT, BitBucket, JIRA and IntelliJ
•   Domain knowledge of banking, financial services, insurance and digital measurements
•   4 years experience in leading team of 4-5 developers.

Skills

Positions

Data Analysts

Data Scientist

Quality Analysts

Business Analysts

Data Engineer

Portfolio Projects

Description

Description: The CCB Multitenant Discovery platform serves data need of 8 different analytical teams, with usage ranging from analytics to machine learning.

Responsibility:

Enhanced the Python based ingestion framework to have embedded Spark based validation and added new set of DQ checks
Created Hadoop based datasets which can be used as data source cube for Asset and Wealth Management team, containing customer-wise credit card spends, deposits as well as demographic data

Tech stack: Spark SQL, Impala, Hive, Sqoop, Teradata, Greenplum, IntellijIdea

Description: The CCB RFT Finance team at JP Morgan is building one of the largest data lake containing bank’s Auto, Mortgage, Cards and Deposits data.

Responsibility:

Worked as Enrichment Lead and pioneered high performance Spark SQL based enrichment guidelines
Contributed towards Spark SQL based enrichment engine leveraging most advanced features of Spark 2.x
Created a Scala based data obfuscation service which masks PI attributes for data requirements in lower environment
Pivotal contribution in improving stability of cluster by designing below tools:
- Compression service – compressed 90T data within matter of hours using this tool and gained 75% storage savings
- Data compaction tool – Merged small files effectively which provided stability to cluster and reduced occurrence of impala daemon crashes.
Created a health and job monitoring service for 150 node Cloudera CDH cluster, one of the largest in JP Morgan Chase.
Responsible for driving the design decisions of Mortgage Data Mart

Tech stack: Spark SQL, Impala, Hive, Sqoop

Show More Show Less

Description

Description: Harmonization is the initiative of moving the legacy Hadoop based products to AWS. Our team is working on building cloud based data collectors. First product to go-live is DCR (Digital Content Ratings) with volume of 45 Million pings per day.

Responsibility:

- Designed new messaging system based on Amazon Kinesis and Spark
- Implemented spark receivers to scan new data arriving to S3 bucket in realtime
- Implemented on real-time algorithms to detect bot activity based on User Agents and IP addresses
- Engaged with AWS support to resolve critical issues in Kinesis/EMR.

Tech stack: AWS, EMR, EC2, S3, Kinesis, Spark

Description: Daily Ad Ratings analyzes global advertising campaigns - count of impressions on per campaign, per device and per placement basis for around 600 Nielsen partners and affiliates . Daily volume of 1.2 Billion individual impressions.

Responsibility:

Architected Hive warehouse, involved in tuning of HiveQL queries
Designed NoSQL (HBase) schema to enable fast scan and high throughput.

Show More Show Less

Description

1. Description: Oracle EBS Financial - Accounts Payable (AP) is an ERP application using

which large enterprises run their day to day payments and accounting activities to

their suppliers. I was involved in product development of AP.

Responsibility:

Managing bug backlog at comfortable level
Root cause analysis (RCA) of production critical issues

Achievements:

Reduced product bug backlog by 40% in short span of 1 year
Proactive initiative resulted in bug inflow reduction by 30%

Tech stack: Core Java, Data structures, algorithms, design patterns, multi-threading, Oracle 11g, PL/SQL, Performance tuning, TKPROF, AWR

2. Description: Global IT team is responsible for managing Oracle’s enterprise data

warehouse (EDW). Our team maintains the financials hub.

Responsibility:

Plan, analyze, lead development and fulfill data request from customer
Mentoring and grooming junior talents, bringing them upto speed
Maintain the standard and guideline across code artifacts
Reviewing code fixes and patch
Coordinating steering committee meetings with cross-functional teams

Show More Show Less

Description

Description: VUE is an application of insurance domain which maintains performance measures of agents like Retention, Growth, Net Inforce, Disenrollment and Persistency, and calculates brokerage as per target achieved. My role was to migrate data from three different sources to VUE ware house and maintain the accurate measures at any time.

Responsibility: