Snehal B.

Senior Big Data Engineer

Mumbai , India

Experience: 5 Years

Snehal

Mumbai , India

Senior Big Data Engineer

27401.1 USD / Year

Notice Period: Days

5 Years

Now you can Instantly Chat with Snehal!

Chat Now

About Me

5+ years of experience in Big Data and Hadoop Ecosystem tools. Experience on cloud technologies like AWS Lambda, Sagemaker, EC2. Hands on experience with Spark and Scala code development. Derived predictions for bandwidth utilization data of links us...

Skills

Positions

ML/AI Engineers

Data Analysts

Consultants

Data Scientist

Cloud Architects

Business Analysts

Data Engineer

Senior Technical Recruiter

Portfolio Projects

Description

This Project is for American architecture software client to detect piracy of their product and convert them into legitimate users.

Developing scala code to detect pirate users and run the code on Spark cluster..
Automate the workflow using Oozie-Jenkins.

Show More Show Less

Description

This Project was for American architecture software client.

Tested AWS Sagemaker data science code in AWS EC2 instance.
Reduced code runtime using EC2.
Automated start/stop EC2 instance through Oozie-Jenkins workflow.

Show More Show Less

Description

This Project was for technology client.

Developed Scala code to process data stored in Hive.
Reduced BI layer downtime by performing hive optimization.
Created Talend jobs to launch Spark code in cluster for data processing.

Show More Show Less

Description

This project was part of Customer Engagement Platform(CEP) initiative for healthcare client (American multinational biopharmaceutical company). They were creating data lake for centralizing all product, customer, professional and sales data.

For processing data stored in Hive, developed code using Snaplogic tool.
Implemented complex logic using Snaplogic’s limited functionality tools.

Show More Show Less

Description

This project was in BFSI domain. Client was Canadian multinational financial services organization. This project was in migration phase from Oracle based architecture to Big Data Platform.

Developed Scala code for processing data stored in hive tables using Spark. Processed data was used for report generation in Tableau for client.
For processing Capital Market data used Spark-SQL functions in scala code.

Show More Show Less

Description

Centralization of employee specific data in order to get all relevant information for a employee at single place. Performed data ingestion in hadoop from multiple data sources. Facilitated insightful daily analysis by comparing multiple datasets for various use cases like asset tracking, associates allocation,etc.

Developed Sqoop script to import data from various databases like MySQL, Postgres, Oracle, MS-SQL,etc in HDFS. Performed merging of daily incremental data with existing data using sqoop merge.
Created Hive queries to fine tune imported data and join it with other datasets and executed it using Spark-SQL.
Provided design recommendations and thought leadership to other stakeholders that improved review processes and resolved technical problems.
Shared responsibility for administration of Hadoop, Hive, Spark.

Show More Show Less

Description

Deployed R for providing prediction values of several server parameters like CPU,Mem, Disk,etc. This was useful for server administrators to know utilization of their servers in upcoming month.

Retrieved server utilization data from MySQL by integrating R with MySQL database.
After some fine tuning data in R, fed it to the ARIMA model in R to obtain predictions for same.

Show More Show Less

Description

Contributed in enhancing the business processes within TCS organization by providing Log Analytics

using Hadoop and other big data tools. This saved lot of efforts and gained huge profits for organization.

Configured multiple log collectors like Flume, Logstash, Filebeat, Nxlog to collect logs from various sources.
Developed Hive scripts to process the data stored in HDFS. Also adopted optimization techniques to lessen the data processing time and achieve results faster.
For providing real time visualization of data used Elasticsearch NOSQL database and Kibana as visualization layer on top of it.
To notify user about 3 consecutive failure login attempts on a server, configured ElastAlert tool on top of Elasticsearch and send alert to user on occurrence of that event.

Show More Show Less

Description

Contributed in enhancing the business processes within TCS organization by providing Log Analytics using Hadoop and other big data tools. This saved lot of efforts and gained huge profits for organization. Configured multiple log collectors like Flume, Logstash, Filebeat, Nxlog to collect logs from various sources. Developed Hive and Pig Scripts to process the data stored in HDFS. Also adopted optimization techniques to lessen the data processing time and achieve results faster. For providing real time visualization of data used Elasticsearch NOSQL database and Kibana as visualization layer on top of it. To notify user about 3 consecutive failure login attempts on a server, configured ElastAlert tool on top of Elasticsearch and send alert to user on occurrence of that event.

Show More Show Less

Description

Improved server data filling MS-Excel template by applying validations which were implemented by writing VBA code. Due to this, ultimately users had to fill correct data , thereby smoothening data ingestion in MySQL database for further data processing. Developed code to make it mandatory for users to fill at least 5 columns out of 15 while filling server details in Excel sheet. Used color indicators, to notify user if he/she enters wrong IP address. Data validation was done as soon as user enters data in a cell.

Show More Show Less