Now you can Instantly Chat with Snehal!
About Me
5+ years of experience in Big Data and Hadoop Ecosystem tools. Experience on cloud technologies like AWS Lambda, Sagemaker, EC2. Hands on experience with Spark and Scala code development. Derived predictions for bandwidth utilization data of links us...
Show MoreSkills
Portfolio Projects
Description
This project was part of Customer Engagement Platform(CEP) initiative for healthcare client (American multinational biopharmaceutical company). They were creating data lake for centralizing all product, customer, professional and sales data.
- For processing data stored in Hive, developed code using Snaplogic tool.
- Implemented complex logic using Snaplogic’s limited functionality tools.
Description
This project was in BFSI domain. Client was Canadian multinational financial services organization. This project was in migration phase from Oracle based architecture to Big Data Platform.
- Developed Scala code for processing data stored in hive tables using Spark. Processed data was used for report generation in Tableau for client.
- For processing Capital Market data used Spark-SQL functions in scala code.
Description
Centralization of employee specific data in order to get all relevant information for a employee at single place. Performed data ingestion in hadoop from multiple data sources. Facilitated insightful daily analysis by comparing multiple datasets for various use cases like asset tracking, associates allocation,etc.
- Developed Sqoop script to import data from various databases like MySQL, Postgres, Oracle, MS-SQL,etc in HDFS. Performed merging of daily incremental data with existing data using sqoop merge.
- Created Hive queries to fine tune imported data and join it with other datasets and executed it using Spark-SQL.
- Provided design recommendations and thought leadership to other stakeholders that improved review processes and resolved technical problems.
- Shared responsibility for administration of Hadoop, Hive, Spark.
Description
Deployed R for providing prediction values of several server parameters like CPU,Mem, Disk,etc. This was useful for server administrators to know utilization of their servers in upcoming month.
- Retrieved server utilization data from MySQL by integrating R with MySQL database.
- After some fine tuning data in R, fed it to the ARIMA model in R to obtain predictions for same.
Description
Contributed in enhancing the business processes within TCS organization by providing Log Analytics
using Hadoop and other big data tools. This saved lot of efforts and gained huge profits for organization.
- Configured multiple log collectors like Flume, Logstash, Filebeat, Nxlog to collect logs from various sources.
- Developed Hive scripts to process the data stored in HDFS. Also adopted optimization techniques to lessen the data processing time and achieve results faster.
- For providing real time visualization of data used Elasticsearch NOSQL database and Kibana as visualization layer on top of it.
- To notify user about 3 consecutive failure login attempts on a server, configured ElastAlert tool on top of Elasticsearch and send alert to user on occurrence of that event.
Description
Contributed in enhancing the business processes within TCS organization by providing Log Analytics using Hadoop and other big data tools. This saved lot of efforts and gained huge profits for organization. Configured multiple log collectors like Flume, Logstash, Filebeat, Nxlog to collect logs from various sources. Developed Hive and Pig Scripts to process the data stored in HDFS. Also adopted optimization techniques to lessen the data processing time and achieve results faster. For providing real time visualization of data used Elasticsearch NOSQL database and Kibana as visualization layer on top of it. To notify user about 3 consecutive failure login attempts on a server, configured ElastAlert tool on top of Elasticsearch and send alert to user on occurrence of that event.
Show More Show LessDescription
Improved server data filling MS-Excel template by applying validations which were implemented by writing VBA code. Due to this, ultimately users had to fill correct data , thereby smoothening data ingestion in MySQL database for further data processing. Developed code to make it mandatory for users to fill at least 5 columns out of 15 while filling server details in Excel sheet. Used color indicators, to notify user if he/she enters wrong IP address. Data validation was done as soon as user enters data in a cell.
Show More Show Less