Balakrishna G.

Balakrishna G.

Bigdata Hadoop Developer

Bengaluru , India

Experience: 6 Years

Balakrishna

Bengaluru , India

Bigdata Hadoop Developer

10285.7 USD / Year

  • Immediate: Available

6 Years

Now you can Instantly Chat with Balakrishna!

About Me

Having 6.2 years of IT experience as a Bigdata Hadoop Developer in designing and developing enterprise applications on multiple technologies like Apache Spark & Hadoop Ecosystem. Working at Capgemini Technology Services from Dec-2018 To till Date Wor...

Show More

Portfolio Projects

Description

Description:

The objective of this project is creating Hive External tables from Master tables and preprocessing and storing it back to Hive External tables and further will be used for data scientists to train their models.

Responsibilities:

  • Worked with CDH Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster.
  • Responsible for coordinating end to end project related activities.
  • Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster.
  • Understand the business requirement from Data scientists and data will be dumped from hive external table into another hive external table with partition and specific duration from master table.
  • After a successful data load into hive table it will be accessed by pyspark.
  • Developed pyspark script and further loaded into hive tables to train their models.
  • Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit

Show More Show Less

Description

Description:

The objective of this project is to build a Realtime Streaming Data pipeline and further loaded into MongoDB which will be used for Data Scientist to Predict their Models.

Responsibilities:

  • Worked with CDH Cluster containing 126 Nodes Production Cluster and 64 Nodes Development cluster.
  • Responsible for coordinating end to end project related activities.
  • Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster.
  • Developed Logstash Filter to filter the data based on Domain and Service Names coming from Splunk Enterprise.
  • Loading filtered data into Kafka Topic.
  • Developed Scala script to subscribe Kafka topic data and further loaded into hdfs location.
  • Developed Scala script to read hdfs data and further loaded into MongoDB.
  • Capturing Oozie scheduled Failed jobs in Ambari both sqoop injection and spark jobs and taking necessary action
  • Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit
  • Done performance testing for end to end Realtime data flow from Splunk to MongoDB.

Show More Show Less

Description

Description:

The objective of the project is to develop spark applications to convert informatica workflows and further laoded into Hive orc tables and which will be used for downstream systems. End-to-End flow is scheduled using Talend

Responsibilities:

  • Worked with HDP Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster.
  • Responsible for coordinating end to end project related activities.
  • Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of the application in the Hadoop cluster.
  • Developed Shell Scripting to pull the files from informatica server to Hadoop.
  • Responsible for creating hive external & internal(orc) tables with zlib compression.
  • Developed Spark Ingestion Framework to load the data from Hive External Tables to internal tables at one shot
  • Understanding the informatica mappings and documented the business logic
  • Developed Spark code for consumption layer which includes informatica logic and further loaded data into hive fact & dimension tables
  • Used sbt to compile & package the scala code into jar and deployed the same in cluster using spark-submit

Show More Show Less

Description

The objective of this project is to build a Realtime Streaming Data pipeline and further loaded into MongoDB which will be used for Data Scientist to Predict their Models. Worked with CDH Cluster containing 126 Nodes Production Cluster and 64 Nodes Development cluster. Responsible for coordinating end to end project related activities. Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster. Developed Logstash Filter to filter the data based on Domain and Service Names coming from Splunk Enterprise. Loaded filtered data into Kafka Topic. Developed Scala script to subscribe Kafka topic data and further loaded into hdfs location. Developed Scala script to read hdfs data and further loaded into MongoDB. Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit Done performance testing for end to end Realtime data flow from Splunk to MongoDB.

Show More Show Less

Description

The objective of this project is creating Hive External tables from Master tables and preprocessing and storing it back to Hive External tables and further will be used for data scientists to train their models. Worked with CDH Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster. Responsible for coordinating end to end project related activities. Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster. Understand the business requirement from Data scientists and data will be dumped from hive external table into another hive external table with partition and specific duration from master table. After a successful data load into hive table it will be accessed by pyspark. Developed pyspark script and further loaded into hive tables to train their models. Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit

Show More Show Less

Description

The objective of the project is to develop spark applications to convert informatica workflows and further laoded into Hive orc tables and which will be used for downstream systems. End-to-End flow is scheduled using Talend Worked with HDP Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster. Responsible for coordinating end to end project related activities. Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of the application in the Hadoop cluster. Developed Shell Scripting to pull the files from informatica server to Hadoop. Responsible for creating hive external & internal(orc) tables with zlib compression. Developed Spark Ingestion Framework to load the data from Hive External Tables to internal tables at one shot Understanding the informatica mappings and documented the business logic Developed Spark code for consumption layer which includes informatica logic and further loaded data into hive fact & dimension tables Used sbt to compile & package the scala code into jar and deployed the same in cluster using spark-submit

Show More Show Less

Description

The objective of the project is to develop Spark Application for identifying CDC(Change Data Capture) from Source side(Oracle) and further loaded into Hive External Tables which will be used for downstream systems. Worked with HDP Cluster containing 32 Nodes Production Cluster and 16 Nodes Development cluster Responsible for coordinating end to end project related activities. Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of the application in the Hadoop cluster. Involved in Design and Development of technical specification documents. Interacting with the business teams to understand the business problems and designing applications with Hadoop ecosystem. Sqooping the data which includes both full load and incremental load from Oracle to HDFS The data captured from Oracle loaded into Landing Zone Hive Tables. From Landing Zone, adding Create Date & Last Modified Date for capturing CDC and further loaded into Raw Data Zone. Performing Business Logics and Transformations in Spark. Capturing Change Data Capture(CDC) from source based on Primary Key field using Spark and finally loaded into Data Lake Hive Tables Performed RDDs, Data Frames , Spark Joins, Spark Sql using Scala

Show More Show Less

Description

Team India Compliance work on CTS (Complaints Tracking System) to work on various complaints received from Member/Provider. Compliance Analysts work on CTS-Intake Cases to provide sufficient proof for the Investigator (Team U.S) to work on these cases. Compliance Analysts study each of the Intake Cases & take required screen shots from Facets, EAM & Task Tracker, attach the same as proof in CTS & assign this case to an Investigator (Team U.S)For each type of case there are fixed number of screen shots to be attached. The process is time consuming & tedious. An application needs to be developed to auto populate these screen shots on entering the Subscriber ID or the Claim ID in a search box & save the same in a network folder.

Show More Show Less