SANTHOSH N.

Senior Data Engineer

Hyderabad , India

Experience: 7 Years

SANTHOSH

Hyderabad , India

Senior Data Engineer

USD / Year

Start Date / Notice Period end date:

7 Years

Now you can Instantly Chat with SANTHOSH!

Chat Now

About Me

Over 7 years of experience ranging from code development to production deployments in Data Warehousing Projects(DW/BI) of which 3 years are into BigData Eco-system. Currently Working as Senior Data Engineer for TDR (Tax Data Repository) project at Th...

Skills

Positions

Data Analysts

Database Administrator

Data Scientist

Software Engineer

Data Engineer

Portfolio Projects

Thomsonreuters.com

Description

Description: TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and loaded into Hive tables. Another spark process will extract the required data from these tables and sqoop into Oracle tables. Tax Research teams uses this data and generate reports using Tableau dashboards. All the data entered in TDR will be consumed by downstream applications. Latest improvements include developing functionality for Dgfip file (France) loading into TDR and processing files using python.

Responsibilities:

· Sqooping data from existing Oracle database to HDFS.

· Developing spark programs using Scala.

· Loading JSON and CSV data into HIVE tables.

· Converting existing MapReduce programs into Spark transformations using Spark RDDs and Scala.

· Develop DB objects and pl/sql code using Toad, sql developer.

· Experience in importing data from cloud sources like AWS S3 into Spark RDD.

· Writing automated scripts to maintain data integrity in different schema or downstream applications.

· Deploying jobs in different environments like Dev, and QA using Git, Source-Tree and Redgate tools.

· Communicate with product owner extensively to implement the business requirements in TDR.

· Hands-on experience using pl/sql for content processing.

· Working on XMLs and JSON to process datasets to load into DB from UI.

· Experience in creating Tableau dashboards for business needs.

· Experience in working with Python scripts for ETL.

Show More Show Less

Description

TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and loaded into Hive tables. Another spark process will extract the required data from these tables and sqoop into Oracle tables. Tax Research teams uses this data and generate reports using Tableau dashboards. All the data entered in TDR will be consumed by downstream applications. Latest improvements include developing functionality for Dgfip file (France) loading into TDR and processing files using python.

Show More Show Less

https://www.unitedhealthgroup.com

Contribute

• Migrating data from Oracle database to HDFS using sqoop component and scheduling with Oozie. • Deriving insights from HIVE tables for analytics. • Performing analytical queries on HDFS using HiveQL

Description

Description: Facets RVDW receives millions of claims daily and these are designed to get processed and loaded into DW tables through various UNIX and DataStage jobs. As the database size extended to 15TB, data has been migrated to Hadoop clusters using Sqoop. Hundreds of extract reports from the loaded tables were generated periodically and sent to external Vendors & other UHG applications. An approximate of 500 BO users pulls data every day for reporting purpose. And more than 1000 Database users perform queries as per their need.

Show More Show Less

Description

Facets RVDW receives millions of claims daily and these are designed to get processed and loaded into DW tables through various UNIX and DataStage jobs. As the database size extended to 15TB, data has been migrated to Hadoop clusters using Sqoop. Hundreds of extract reports from the loaded tables were generated periodically and sent to external Vendors & other UHG applications. An approximate of 500 BO users pulls data every day for reporting purpose. And more than 1000 Database users perform queries as per their need.

Show More Show Less

https://www.optum.com

NEIGHBOURHOOD HEALTH PLAN (NHP)

Contribute

• Designing and developing ETL jobs in Data Stage using different stages. • Analyzing DS codes and fixing design issues along with Business changes. • Deploying Data Stage jobs to production using IT

Description

The NHP DW provides healthcare services in Tennessee, Illinois, Iowa, and Florida. NHP receives data from multiple internal and external sources in different file formats and from tables. The data connections are made from different databases (Sybase, SQL server and Oracle). Design and develop various stages in Data stage for extracting, transforming and loading raw data from different sources. Various business logics were applied and data conversions take place in multiple stages before loading into tables. These tables will be queried by BI team for various reports

Show More Show Less