Suraj R.

Suraj R.

Data Engineer with experience in Pyspark, Hadoop, AWS, Airflow, SQL, SSIS, ETL, Azure, Databricks

Bengaluru , India

Experience: 7 Years

Suraj

Bengaluru , India

Data Engineer with experience in Pyspark, Hadoop, AWS, Airflow, SQL, SSIS, ETL, Azure, Databricks

7 Years

Now you can Instantly Chat with Suraj!

About Me

  • 7+ years of experience in data analytics with renowned MNCs
  • Experienced Data Engineer with a demonstrated history of working in Telecom, Supply Chain, HCM, and Advertising domain
  • Excellent understanding of Big data stack...
  • Excellent understanding of Big data stack. Expertise on Spark, Hadoop and its ecosystem components
  • Experience on AWS cloud platform and its services including EC2, ECS, DynamoDB, SNS, RDS, Secret Manager, S3 etc.
  • Implemented Airflow for various types of DAGs for Telecom data.
  • Worked on preprocessing of structured and unstructured data & implemented various Database designs
  • Worked on identification of valuable data sources and automation of the collection processes
  • Experience on various migration projects to migrate the legacy systems to big data systems (Hadoop, Spark, Azure Data Bricks, S3)
  • Proficient in Python & Scala programming and worked on NoSQL Databases (Dynamo DB, HBase, MongoDB)

Show More

Portfolio Projects

Big Data

Company

Big Data

Description

  • Working in Data Engineer group, extracting, ingesting, transforming Ad Marketing, Telecom, Logistics data to be consumed by downstream teams
  • Developed flow to migrate the data from 50+ sources to AWS S3 using python scripts
  • Responsible for building and supporting a Big Data-based ecosystem designed for enterprise-wide analysis of structured, semi-structured, and unstructured data.
  • Managing a team of 4 people and assigning day to day tasks to them.
  • Validating migrated data, performing data quality checks, implementing CI/CD and orchestration using Jenkins, Terraform, Docker etc.
  • Reviewing code/providing feedback relative to best practices, performance improvements etc. and work in the pair-programming environment also
  • Developed NLP model for sentimental Analysis on twitter data (Logistic Regression).

Show More Show Less

Skills

Hadoop AWS

Tools

PyCharm

Airflow

Company

Airflow

Description

  • Automated the process of batch data transfer to be loaded into RedShift using Hive & Shell scripts
  • Establish a strong working relationship with business, teammates, and others within the organization

Show More Show Less

Tools

PyCharm

ETL

Company

ETL

Description

  • Developed a framework to migrate the data from different legacy sources to HDFS and using SSIS and T-SQL to process further
  • Lead a project with 5 people for a PAN India implementation of an invoicing tool for Reliance Communication which improved efficiency in invoicing by 70% and saved 2430 man-hours monthly
  • Introduced big data analytics to automate the generation of 250 reports per month catering from Engineer to CEO
  • Assist application development teams during application design and development for highly complex and critical data projects
  • Involved in POCs to adopt new technologies to improve data platform management in large scale with high throughput

Show More Show Less

Skills

SQL