Digvijay Y.

Digvijay Y.

Machine Learning Engineer

Pune , India

Experience: 3 Years

Digvijay

Pune , India

Machine Learning Engineer

60786.4 USD / Year

  • Notice Period: Days

3 Years

Now you can Instantly Chat with Digvijay!

About Me

An IIT B postgraduate with 4.8+ years of experience in deep learning and machine learning with research background from TCS Research. Skilled in python packages like Scikit-learn, Numpy, Pandas, Tensorflow, Keras, Hugging Face, NLTK, Spacy, Gensim, M...

Show More

Portfolio Projects

Description

This project identified the most relevant Influencers among the 15k Influencers

The dataset is obtained from Instagram posts containing text and images.

Data cleaning is done for text and images followed by the labelling of posts.

BERT and Inception models are used to extract features from images and text

These features are used as input to the attention layer and that is classified in various categories

The category present in the posts is decided based on the topic present in the posts.

Show More Show Less

Description

+ This project helped more than 500 researchers for extracting summaries from research abstracts

+ The dataset used for this study has 200000 paragraphs with their summary

+ Formulated summarization task as a multi-label deletion-based problem

+ The baseline model is Bi-LSTM based model with GloVe embeddings as input

+ The pre-trained BERT model is fine-tuned with BERT word piece embeddings as input

Show More Show Less

Description

The data for this task is obtained from various forums like the dark web, Twitter, telegram etc.The first step is to get labels for this data, which we achieved by using a rule-based approach.In the rule-based approach the text related to cybersecurity is identified by the keywords present in it.Once data is ready, we fine-tuned the SecBert model to our data and achieved a higher accuracy.The same model is integrated into Cyble Vision Platform and deployed in production.

Show More Show Less

Description

The main task of this project is to identify the offensive and negative sentiments.We have used twitter data related to our customers to identify the above sentiments.To build this analyzer we used TweetEval RoBERTa model which was already trained on the tweet data.During the inferencing, we used Ray to make prediction faster and the same model is deployed in the production.

Show More Show Less

Description

For this project we used Presidio analyzer which identify many of the pii entities.We added custom recognizers by using the regex and improved some of the present regexes in an analyzer.We compared the performance of pii analyzer using ray, dask and pandarallel and found ray to be the most efficient.The analyzer is integrated with Cyble Vision Platform and deployed in production

Show More Show Less

Description

This project increased efficiency of computations by 3X and reduced the time of trainingThis was the regression problem where the dataset given in the literature was used.As part of the data cleaning removed the outliers, NaN values and duplicates from the data.10 fold CV and used weight regularization to avoid overfitting.Transfer learning setting used to develop a multitasking model.The final results are publishing in Nature Scientific Reports with the filing of the Patent

Show More Show Less

Description

The Twitter dataset is used for this study initially cleaned all the datasetTFIDF is used to convert corpus into the feature vectorTopic modelling is applied to get the most relevant topic from the dataFrom all the topics Beauty, Parenting, Food, Sports and Fitness are labelled to each text postsMulticlass classification model is developed based on the labels by using LSTM architecture

Show More Show Less

Description

All data cleaning, visualization and preprocessing are done initially for cosmetic data.The final dataset has 3617 rows, 63 labels and each row is represented by the 109 descriptors.Random forest classifier, Support vector machine and Xgboost classifier are used .LSTM based deep learning model is used for buiding the classification model .

Show More Show Less