Now you can Instantly Chat with Digvijay!
About Me
An IIT B postgraduate with 4.8+ years of experience in deep learning and machine learning with research background from TCS Research. Skilled in python packages like Scikit-learn, Numpy, Pandas, Tensorflow, Keras, Hugging Face, NLTK, Spacy, Gensim, M...
Show MoreSkills
Positions
Portfolio Projects
Description
This project identified the most relevant Influencers among the 15k Influencers
The dataset is obtained from Instagram posts containing text and images.
Data cleaning is done for text and images followed by the labelling of posts.
BERT and Inception models are used to extract features from images and text
These features are used as input to the attention layer and that is classified in various categories
The category present in the posts is decided based on the topic present in the posts.
Show More Show LessDescription
+ This project helped more than 500 researchers for extracting summaries from research abstracts
+ The dataset used for this study has 200000 paragraphs with their summary
+ Formulated summarization task as a multi-label deletion-based problem
+ The baseline model is Bi-LSTM based model with GloVe embeddings as input
+ The pre-trained BERT model is fine-tuned with BERT word piece embeddings as input
Show More Show LessDescription
The data for this task is obtained from various forums like the dark web, Twitter, telegram etc.The first step is to get labels for this data, which we achieved by using a rule-based approach.In the rule-based approach the text related to cybersecurity is identified by the keywords present in it.Once data is ready, we fine-tuned the SecBert model to our data and achieved a higher accuracy.The same model is integrated into Cyble Vision Platform and deployed in production.
Show More Show LessDescription
The main task of this project is to identify the offensive and negative sentiments.We have used twitter data related to our customers to identify the above sentiments.To build this analyzer we used TweetEval RoBERTa model which was already trained on the tweet data.During the inferencing, we used Ray to make prediction faster and the same model is deployed in the production.
Show More Show LessDescription
For this project we used Presidio analyzer which identify many of the pii entities.We added custom recognizers by using the regex and improved some of the present regexes in an analyzer.We compared the performance of pii analyzer using ray, dask and pandarallel and found ray to be the most efficient.The analyzer is integrated with Cyble Vision Platform and deployed in production
Show More Show LessDescription
This project increased efficiency of computations by 3X and reduced the time of trainingThis was the regression problem where the dataset given in the literature was used.As part of the data cleaning removed the outliers, NaN values and duplicates from the data.10 fold CV and used weight regularization to avoid overfitting.Transfer learning setting used to develop a multitasking model.The final results are publishing in Nature Scientific Reports with the filing of the Patent
Show More Show LessDescription
The Twitter dataset is used for this study initially cleaned all the datasetTFIDF is used to convert corpus into the feature vectorTopic modelling is applied to get the most relevant topic from the dataFrom all the topics Beauty, Parenting, Food, Sports and Fitness are labelled to each text postsMulticlass classification model is developed based on the labels by using LSTM architecture
Show More Show LessDescription
All data cleaning, visualization and preprocessing are done initially for cosmetic data.The final dataset has 3617 rows, 63 labels and each row is represented by the 109 descriptors.Random forest classifier, Support vector machine and Xgboost classifier are used .LSTM based deep learning model is used for buiding the classification model .
Show More Show Less