Now you can Instantly Chat with Shubham!
About Me
3 years of overall IT experience in Big Data Hadoop and Spark with Scala. Exclusive experience in Hadoop and its components like HDFS, Hive, Sqoop and Spark Executed Jobs in Spark local mode, Pseudo distributed mode, Hadoop Cluster mode for productio...
Show MoreSkills
Portfolio Projects
Description
Analyzing the source systems data before loading into HDFS. Involved in writing the FTP scripts to bring the data from NFSMount point to hadoop local environment. Involved in data loading and validating the data fields as part of data loading. Worked on Map reduce job development to process the data with respect to different customers data on their personal data in different financial services. Involved in developing the Custom Input Format classes to work with XML Feed, PDF Feeds coming from some source systems Worked on Hive Script development which includes Partitioning. Involved in MySQL Metastore configuration for hive as an external metastore.
Show More Show LessDescription
Roles and Responsibilities:
· Implemented newer concepts use like Apache Spark and Scala programming
· Managing data coming from 200+ different sources
· Loaded unstructured data into Hadoop File System( HDFS)
· Written validation and data quality scripts
· Implementation of Cloudera cluster with High availability and standby solutions.
· Design and support of Data ingestion, Data Migration and Data processing for BI and Data Analytics.
· Responsible for developing data pipeline Sqoop and pig to extract the data from weblogs and store in HDFS.
· Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
· Worked on HIVE- Integration-Spark SQL scripts for performance enhancement
· Worked on DataFrame Development as part of SparkSQL.
Show More Show Less