Balakrishna G.

Bigdata Hadoop Developer

Commitment
0/ 5
Competency
0/ 5
Reliability
0/ 5
  • Overall Experience: 10 Years  
  • Agile Software Development:
  • Algorithm Development:
  • Amazon Relational Database Service:
  • Data Lake:
  • Apache Tomcat:

Balakrishna G. 

Bigdata Hadoop Developer

Commitment
0/5
Competency
0/5
Reliability
0/5

Time zones ready to work

  • Eastern Daylight [UTC -4]
  • Australian EDT [UTC +11]
  • Dubai [UTC +4]
  • New Delhi [UTC +5]
  • China (West) [UTC +6]
  • Singapore [UTC +7]
  • Hong Kong (East China) [UTC +8]

Willing to travel to client location: Yes  

About Me 

Having 6.2 years of IT experience as a Bigdata Hadoop Developer in designing and developing enterprise applications on multiple technologies like Apache Spark & Hadoop Ec
Having 6.2 years of IT experience as a Bigdata Hadoop Developer in designing and developing enterprise applications on multiple technologies like Apache Spark & Hadoop Ecosystem. Working at Capgemini Technology Services from Dec-2018 To till Date Worked at Nextgen Healthcare India Pvt Ltd from Nov 2014 to Oct-2018 Worked at XL Health Corporation India Pvt Ltd from Oct 2013 to Aug 2014 Excellent understanding / knowledge of Apache Spark & Hadoop YARN architecture and ecosystem. Experience with major Hadoop distributions Cloudera (CDH). Experience in developing business applications using Apache Spark, Hive, Scala, Pyspark Good experience in using Linux shell scripting. Team-based management style and excellent interpersonal, research/analysis and communication skills. Excellence skill of cross-functional activities with multi-functional team like vendors, ordering team and operation & maintenance team. Experience in all phases of Sprint - Agile methodology.
Show More

Interview Videos

Signup to see videos

Risk-Free Trial, Pay Only If Satisfied.

Portfolios

Historical Data Processing Using Pyspark

Role:

Description:

The objective of this project is creating Hive External tables from Master tables and preprocessing and storing it back to Hive External tables and further will be used for data scientists to train their models.

Res

Description:

The objective of this project is creating Hive External tables from Master tables and preprocessing and storing it back to Hive External tables and further will be used for data scientists to train their models.

Responsibilities:

  • Worked with CDH Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster.
  • Responsible for coordinating end to end project related activities.
  • Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster.
  • Understand the business requirement from Data scientists and data will be dumped from hive external table into another hive external table with partition and specific duration from master table.
  • After a successful data load into hive table it will be accessed by pyspark.
  • Developed pyspark script and further loaded into hive tables to train their models.
  • Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit
Show More

Skills: PySparkSBT

Tools: IntelliJ IDEA

Realtime Streaming Data pipeline

Role:

Description:

The objective of this project is to build a Realtime Streaming Data pipeline and further loaded into MongoDB which will be used for Data Scientist to Predict their Models.

Responsibilities:

Description:

The objective of this project is to build a Realtime Streaming Data pipeline and further loaded into MongoDB which will be used for Data Scientist to Predict their Models.

Responsibilities:

  • Worked with CDH Cluster containing 126 Nodes Production Cluster and 64 Nodes Development cluster.
  • Responsible for coordinating end to end project related activities.
  • Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster.
  • Developed Logstash Filter to filter the data based on Domain and Service Names coming from Splunk Enterprise.
  • Loading filtered data into Kafka Topic.
  • Developed Scala script to subscribe Kafka topic data and further loaded into hdfs location.
  • Developed Scala script to read hdfs data and further loaded into MongoDB.
  • Capturing Oozie scheduled Failed jobs in Ambari both sqoop injection and spark jobs and taking necessary action
  • Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit
  • Done performance testing for end to end Realtime data flow from Splunk to MongoDB.
Show More

Skills: SplunkLogstashApache-KafkaMongoDB

Tools: IntelliJ IDEA

Migration Project

Role:

Description:

The objective of the project is to develop spark applications to convert informatica workflows and further laoded into Hive orc tables and which will be used for downstream systems. End-to-End flow is scheduled using Talend

Description:

The objective of the project is to develop spark applications to convert informatica workflows and further laoded into Hive orc tables and which will be used for downstream systems. End-to-End flow is scheduled using Talend

Responsibilities:

  • Worked with HDP Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster.
  • Responsible for coordinating end to end project related activities.
  • Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of the application in the Hadoop cluster.
  • Developed Shell Scripting to pull the files from informatica server to Hadoop.
  • Responsible for creating hive external & internal(orc) tables with zlib compression.
  • Developed Spark Ingestion Framework to load the data from Hive External Tables to internal tables at one shot
  • Understanding the informatica mappings and documented the business logic
  • Developed Spark code for consumption layer which includes informatica logic and further loaded data into hive fact & dimension tables
  • Used sbt to compile & package the scala code into jar and deployed the same in cluster using spark-submit
Show More

Skills: Apache SparkShell ScriptingLinuxHadoop Distributed File System - (HDFS)Apache Hive

Tools: SCALA IDE

#1

Role:

The objective of this project is to build a Realtime Streaming Data pipeline and further loaded into MongoDB which will be used for Data Scientist to Predict their Models. Worked with CDH Cluster containing 126 Nodes Production Cluster and 64 Nodes Development cluster. Responsible for coordinating e
The objective of this project is to build a Realtime Streaming Data pipeline and further loaded into MongoDB which will be used for Data Scientist to Predict their Models. Worked with CDH Cluster containing 126 Nodes Production Cluster and 64 Nodes Development cluster. Responsible for coordinating end to end project related activities. Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster. Developed Logstash Filter to filter the data based on Domain and Service Names coming from Splunk Enterprise. Loaded filtered data into Kafka Topic. Developed Scala script to subscribe Kafka topic data and further loaded into hdfs location. Developed Scala script to read hdfs data and further loaded into MongoDB. Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit Done performance testing for end to end Realtime data flow from Splunk to MongoDB.
Show More

Skills: Apache SparkSBTApache HiveHadoop Distributed File System - (HDFS)LinuxSplunkMongoDBApache-KafkaShell ScriptingLogstashTesting Framework

Tools:

#2

Role:

The objective of this project is creating Hive External tables from Master tables and preprocessing and storing it back to Hive External tables and further will be used for data scientists to train their models. Worked with CDH Cluster containing 64 Nodes Production Cluster and 36 Nodes Development
The objective of this project is creating Hive External tables from Master tables and preprocessing and storing it back to Hive External tables and further will be used for data scientists to train their models. Worked with CDH Cluster containing 64 Nodes Production Cluster and 36 Nodes Development cluster. Responsible for coordinating end to end project related activities. Involved in all phases of the SDLC including development, testing, and deployment of the application in the Hadoop cluster. Understand the business requirement from Data scientists and data will be dumped from hive external table into another hive external table with partition and specific duration from master table. After a successful data load into hive table it will be accessed by pyspark. Developed pyspark script and further loaded into hive tables to train their models. Used Sbt to compile & package the Scala code into jar and deployed the same in cluster using spark-submit
Show More

Skills: Apache SparkSBTApache HiveHadoop Distributed File System - (HDFS)LinuxPySpark

Tools:

+ More

Employment

Consultant

2018/12 -

Skills: SplunkLogstashApache-KafkaApache SparkMongoDBPySparkHadoop Distributed File System - (HDFS)Apache HiveLinuxAmbariOozie

Your Role and Responsibilities:

HADOOP ECOSYSTEM                 : HDFS, Hive,Sqoop

APACHE SPARK                              : Spark Core, SparkSQL, Spark Streaming

PROGRAMMING                                  

HADOOP ECOSYSTEM                 : HDFS, Hive,Sqoop

APACHE SPARK                              : Spark Core, SparkSQL, Spark Streaming

PROGRAMMING                                                                            : Scala, Pyspark

DATABASE                                                                                                        : Oracle, Sql Server

OPERATING SYSTEMS                                  : Linux and Windows.

SCHEDULING                               : Oozie

Monitoring                                     : Ambari

Show More

SQL Developer

2013/10 - 2014/08

Skills: T SQLSQLStored Procedures

Your Role and Responsibilities:


Education

2006 - 2010


2010


2003


Skills

Agile Software Development Algorithm Development Amazon Relational Database Service Apache Spark Apache Tomcat

Tools

Sublime Text Notepad++ (Win/Mac) IntelliJ IDEA

Hobbies

Learning new things and playing sports

Preferred Languages

English - Hindi - Basic