About Me
Bigdata Ecosystem: Spark, HDFS, Hadoop, Hive, Oozie, Sqoop, Kafka, Spark SQL, Python, Scala,NiFi, AWS.
RDBMS and ETL: ORACLE 9i, 10g, 12c, PL/SQL, SQL*plus, SQL Loader, TOAD, SQL Developer, Query Tunings, ...
Show MoreSkills
Web Development
Data & Analytics
Development Tools
Database
Programming Language
Operating System
Others
Positions
Portfolio Projects
Description: TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and
https://Thomsonreuters.comCompany
Description: TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and
Description
Description: TDR is a Centralized Repository created to store the invoice data coming from different data resources in different formats. Different kinds of data will be formatted and loaded into HDFS, which is then transformed using Spark components and loaded into Hive tables. Another spark process will extract the required data from these tables and sqoop into Oracle tables. Tax Research teams uses this data and generate reports using Tableau dashboards. All the data entered in TDR will be consumed by downstream applications. Latest improvements include developing functionality for Dgfip file (France) loading into TDR and processing files using python.
Responsibilities:
· Sqooping data from existing Oracle database to HDFS.
· Developing spark programs using Scala.
· Loading JSON and CSV data into HIVE tables.
· Converting existing MapReduce programs into Spark transformations using Spark RDDs and Scala.
· Develop DB objects and pl/sql code using Toad, sql developer.
· Experience in importing data from cloud sources like AWS S3 into Spark RDD.
· Writing automated scripts to maintain data integrity in different schema or downstream applications.
· Deploying jobs in different environments like Dev, and QA using Git, Source-Tree and Redgate tools.
· Communicate with product owner extensively to implement the business requirements in TDR.
· Hands-on experience using pl/sql for content processing.
· Working on XMLs and JSON to process datasets to load into DB from UI.
· Experience in creating Tableau dashboards for business needs.
· Experience in working with Python scripts for ETL.
Show More Show LessSkills
AWS Hadoop Hive Python Apache Spark SQL Apache SqoopFACETS RIVER VALLEY DATA WAREHOUSE (RVDW) – Healthcare Domain with UHG
https://www.unitedhealthgroup.comCompany
FACETS RIVER VALLEY DATA WAREHOUSE (RVDW) – Healthcare Domain with UHG
Contribute
• Migrating data from Oracle database to HDFS using sqoop component and scheduling with Oozie. • Deriving insights from HIVE tables for analytics. • Performing analytical queries on HDFS using HiveQL
Description
Description: Facets RVDW receives millions of claims daily and these are designed to get processed and loaded into DW tables through various UNIX and DataStage jobs. As the database size extended to 15TB, data has been migrated to Hadoop clusters using Sqoop. Hundreds of extract reports from the loaded tables were generated periodically and sent to external Vendors & other UHG applications. An approximate of 500 BO users pulls data every day for reporting purpose. And more than 1000 Database users perform queries as per their need.
Show More Show LessTools
SQL Developer ToadNEIGHBOURHOOD HEALTH PLAN (NHP) – Healthcare Domain with UHG
https://www.optum.comNEIGHBOURHOOD HEALTH PLAN (NHP)
Company
NEIGHBOURHOOD HEALTH PLAN (NHP) – Healthcare Domain with UHG
Role
Backend Developer
Contribute
• Designing and developing ETL jobs in Data Stage using different stages. • Analyzing DS codes and fixing design issues along with Business changes. • Deploying Data Stage jobs to production using IT
Description
The NHP DW provides healthcare services in Tennessee, Illinois, Iowa, and Florida. NHP receives data from multiple internal and external sources in different file formats and from tables. The data connections are made from different databases (Sybase, SQL server and Oracle). Design and develop various stages in Data stage for extracting, transforming and loading raw data from different sources. Various business logics were applied and data conversions take place in multiple stages before loading into tables. These tables will be queried by BI team for various reports
Show More Show LessSkills
Oracle SQL UNIX Shell ScriptingTools
Toad