Now you can Instantly Chat with Brahmananda Reddy!
About Me
Having 10 years of IT experience in Development & Enhancement of Data warehouses and have worked on Big Data applications using various tools and technologies like HDFS, Hive, Sqoop, Impala, Python, PySpark, Snowflake, Databricks, Informatica, Terada...nologies like HDFS, Hive, Sqoop, Impala, Python, PySpark, Snowflake, Databricks, Informatica, Teradata and Unix Shell Scripting and good experience in AWS cloud services. Having sound knowledge in Data warehousing/Big Data Architecture, Data Lake and Technologies. Good working exposure on cloud technologies AWS- EC2, S3, Lambda, Glue, Athena, SQS, SNS. Designing and implementing data ingestion pipelines from multiple sources using Apache Spark and/or Databricks. Integrating the end-to-end data pipeline to take data from source systems to target data repositories ensuring the quality and consistency of data is maintained at all times. Involved in migration of data from Teradata to Snowflake. Hands on experience in loading the data to Snowflake Data warehouse and managing the data in snowflake. Created Snowflake external tables, stages, snow pipe, streams, views, Snowpark and Procedures. Hands on experience on major components in Hadoop Ecosystem like HDFS, Hive, Impala, Sqoop, PySpark and Parquet. Ingest the salesforce data to data lake using Streamsets community/enterprise editions. Experience in analyzing data using Hive QL and creation of impala views for business users. Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa. Pre-processing of HDFS data using Hive and PySpark. Good at applying optimizing methods to avoid the memory issues and improve the spark performance. Experience on loading and manipulating large datasets using Spark SQL. Knowledge on CI/CD tools like Jenkins, Artifactory, bit bucket, Jira and Ansible tower. Knowledge on Terraform/Cloud formation templates to provision the AWS services. Good exposure in Data Analysis, Data Cleansing, and Transformation, Integration, Data import, Data export and use of ETL tool including Informatica. Experience on scheduling workflows through Airflow, Oozie, Autosys, UC4, Control-M. Worked on different domains like Health care, Retail, Banking and Insurance.
Show MoreSkills
-
-
-
-
-
- 5 Years
Intermediate
-
-
-
-
-
- 4 Years
Advanced
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 4 Years
Advanced
-
- 2 Years
Intermediate
-
-
- 1 Years
Intermediate
-
-
-
-
-
-
-
- 6 Years
Intermediate
-
-
-
-
- 1 Years
Beginner
-
-
- 6 Years
Advanced
-
-
-
- 3 Years
Advanced
-
-
-
-
- 1 Years
Beginner
-
-
-
- 1 Years
Beginner
-
-
- 4 Years
Advanced
-
-
- 4 Years
Advanced
-
- 3 Years
Advanced
-
- 4 Years
Advanced
-
- 4 Years
Advanced
-
-
-
- 1 Years
Beginner
-
-
-
-
- 2 Years
Beginner
-
-
-
-
- 1 Years
Beginner
-
-
- 1 Years
Beginner
-
-
-
-
-
-
-
-
- 7 Years
Beginner
-
-
-
- 2 Years
-
-
-
- 1 Years
Beginner
-
-
-
-
-
-
-
-
- 2 Years
Intermediate
-
-
- 4 Years
Advanced
-
- 15 Years
Intermediate
-
-
-
-
-
-
-
-
-
-
- 7 Years
-
-
-
-
- 1 Years
Beginner
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
- 4 Years
Advanced
-
-
-
-
-
Portfolio Projects
Description
- Ingest the data from the various RDBMS systems and Kafka streaming data from various producers to loading zone.
- Integrated Kafka and Spark to load the streaming data to Spark Data Frames.
- Experience in building streaming/real time framework using Kafka & Spark.
- Implemented to reprocess the failure messages in Kafka using offset id.
- Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
- Perform data quality checks in loading zone using Pyspark.
- Load the data to raw zone (Hadoop HDFS, hive) after data quality check.
- Perform standard data cleansing and data validations and place the data to trusted zone.
- Developed Pyspark programs for applying business rules on the data.
- Creation hive partition tables and load the parquet data into hive tables.
- Create impala views for business users and analysts to consume the data on refined zone
Description
BAC Mortgage data HUB is the enterprise-wide data warehouse that stores data of all Mortgage loans which are Non-ADS loans. We need to process the mortgage loans data and push the data into tables for end users.
Responsibilities:
- Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
- Creating Hive and UNIX scripts to get the data from SOR Files and pushing into HDFS and then processing in hive.
- Developed map reduce programs for applying business rules on the data.
- Creation hive partition tables and load the parquet data into hive tables.
- Developed PySpark code for applying business rules on the data.
- Worked on implementing CDC logics in Spark SQL.
- Streaming of logs from the server and captured the important details in hive tables.
- Working on yarn logs to get the better understanding of the logs in spark jobs. It helped the team to get rid of reaching out to the cloudera link for logging requirements.
- Tested raw data and executed performance scripts.
- Supported code/design analysis, strategy development and project planning.
- Developed a tool to sync up the hadoop application code from production environment to lower environment as part process improvement and automation.
Verifications
-
Profile Verified
-
Phone Verified
Preferred Language
-
English - Fluent
Available Timezones
BROWSE SIMILAR DEVELOPER
-
KULJIT S
Corporate Mgmt. – PMG, HR, legal, MIS/ERP
-
Ray G
Jack of all trades, master of several.
-
Daniel M
Mainframe Developer/Senior QA Analyst
-
Steven T
Have coded almost everything from firmware through apps, dev to valid to customer suppport
-
Mikhail B
Seniors Micro-Services consultant
-
James B
Managing Director, Sr PM, Consultant
-
Terry L
SAS Consultant
-
RICHARD V
Electronics Circuit Design Engineer
-
MICHAEL H
SENIOR MANUFACTURING TEST ENGINEER
-
Olaf C
Senior AI, Cognitive & Automation Architect Azure/Quantum Hybrid Architect