Now you can Instantly Chat with SANTHOSH!
About Me
Over 7 years of experience ranging from code development to production deployments in Data Warehousing Projects(DW/BI) of which 3 years are into BigData Eco-system. Currently Working as Senior Data Engineer for TDR (Tax Data Repository) project at Th...
Show MoreSkills
Positions
Portfolio Projects
Description
Description: TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and loaded into Hive tables. Another spark process will extract the required data from these tables and sqoop into Oracle tables. Tax Research teams uses this data and generate reports using Tableau dashboards. All the data entered in TDR will be consumed by downstream applications. Latest improvements include developing functionality for Dgfip file (France) loading into TDR and processing files using python.
Responsibilities:
· Sqooping data from existing Oracle database to HDFS.
· Developing spark programs using Scala.
· Loading JSON and CSV data into HIVE tables.
· Converting existing MapReduce programs into Spark transformations using Spark RDDs and Scala.
· Develop DB objects and pl/sql code using Toad, sql developer.
· Experience in importing data from cloud sources like AWS S3 into Spark RDD.
· Writing automated scripts to maintain data integrity in different schema or downstream applications.
· Deploying jobs in different environments like Dev, and QA using Git, Source-Tree and Redgate tools.
· Communicate with product owner extensively to implement the business requirements in TDR.
· Hands-on experience using pl/sql for content processing.
· Working on XMLs and JSON to process datasets to load into DB from UI.
· Experience in creating Tableau dashboards for business needs.
· Experience in working with Python scripts for ETL.
Show More Show LessDescription
TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and loaded into Hive tables. Another spark process will extract the required data from these tables and sqoop into Oracle tables. Tax Research teams uses this data and generate reports using Tableau dashboards. All the data entered in TDR will be consumed by downstream applications. Latest improvements include developing functionality for Dgfip file (France) loading into TDR and processing files using python.
Show More Show LessContribute
• Migrating data from Oracle database to HDFS using sqoop component and scheduling with Oozie. • Deriving insights from HIVE tables for analytics. • Performing analytical queries on HDFS using HiveQL
Description
Description: Facets RVDW receives millions of claims daily and these are designed to get processed and loaded into DW tables through various UNIX and DataStage jobs. As the database size extended to 15TB, data has been migrated to Hadoop clusters using Sqoop. Hundreds of extract reports from the loaded tables were generated periodically and sent to external Vendors & other UHG applications. An approximate of 500 BO users pulls data every day for reporting purpose. And more than 1000 Database users perform queries as per their need.
Show More Show LessDescription
Facets RVDW receives millions of claims daily and these are designed to get processed and loaded into DW tables through various UNIX and DataStage jobs. As the database size extended to 15TB, data has been migrated to Hadoop clusters using Sqoop. Hundreds of extract reports from the loaded tables were generated periodically and sent to external Vendors & other UHG applications. An approximate of 500 BO users pulls data every day for reporting purpose. And more than 1000 Database users perform queries as per their need.
Show More Show LessContribute
• Designing and developing ETL jobs in Data Stage using different stages. • Analyzing DS codes and fixing design issues along with Business changes. • Deploying Data Stage jobs to production using IT
Description
The NHP DW provides healthcare services in Tennessee, Illinois, Iowa, and Florida. NHP receives data from multiple internal and external sources in different file formats and from tables. The data connections are made from different databases (Sybase, SQL server and Oracle). Design and develop various stages in Data stage for extracting, transforming and loading raw data from different sources. Various business logics were applied and data conversions take place in multiple stages before loading into tables. These tables will be queried by BI team for various reports
Show More Show LessDescription
The NHP DW provides healthcare services in Tennessee, Illinois, Iowa, and Florida. NHP receives data from multiple internal and external sources in different file formats and from tables. The data connections are made from different databases (Sybase, SQL server and Oracle). Design and develop various stages in Data stage for extracting, transforming and loading raw data from different sources. Various business logics were applied and data conversions take place in multiple stages before loading into tables. These tables will be queried by BI team for generating various reports.
Show More Show Less