About Me
LANGUAGES AND TECHNOLOGIESProficient: Python,R, SQL, Java, Spark, Pandas, Keras, Scikit-learn, Pytorch, Hadoop,Hive, Matplotlib, Numpy, Dask, Git, Docker, ggplot2, Shiny, FlaskExposure: Javascript, c++, scala, clojure, tensorflow, gluon, mxnet, cassa...ker, ggplot2, Shiny, FlaskExposure: Javascript, c++, scala, clojure, tensorflow, gluon, mxnet, cassandra, Airflow,Kafka, KubernetesPROFESSIONAL EXPERIENCEIQVIA Sr Machine Learning Engineer Montreal, QC Apr 2018 - CurrentTransitioned SAS Oracle based patient analytics system to automated analytics pipeline .used Pyspark for ETL, spark-ml for the Recommendation Engine to predict patientpopulation and automated the work flow using Airflow. Reduced analysis time from 6+hours to 30 minutesCreated model to find rare disease in patients via EHR data. Created the model based onLSTM architecture using EHR data with Pytorch ,built the rest api in Flask to serve model,used Docker to create the microservice to productionize the model. Improved ROC to 0.73from 0.61 used by previous modelsAnalyzing data and Creating models for internal tool which helps in predicting projectsturn around time and budget. Analyzed data with pandas and matplotlib to understandwhy human forecast does not work well created ensemble models with Random Forest,Xgboost and Feed Forward Neural Network to predict total hours of project . Modelsimproved MSE by over 150% compared to the human generated forecastTemenos Data Science Intern Surrey, BC Aug 2015 - Dec 2016Developed visualization tool for business analysts ,built tool using R,shiny and Javascriptwhich helped analysts to speed up their analysis which improved productivity of teamTested and validated capabilities of sql server 2016s R integration.SQL server 2016introduced ability to run R code inside sql statements which eliminated need of bringingdata out using odbc, tested R code base and rewritten components to use in sql R .Thisresulted in a smooth transition to sql server with R integrationCreated real time analytics tools which helps to find unusual transactions for retailcustomers.Used apache kafka ,spark streaming to feed and process streaming datacoming from mobile banking clients. Built online local outlier factor algorithm using scikitlearn to isolate unusual transactions.the system was able to identify unusual transactionsin less than 5 seconds with improved accuracy of 78%Tata Consultancy Services Analytics Developer Kolkata,India and Singapore Dec2009 - Jun 2014Developed statistical report generating tools for treasury application. Used R and sql togenerate reports ,created web app in django for stakeholders to view and accessreports.Reduced cost of acquiring visualization and report generation tools for the clientDeveloped proof of concept tool for portfolio optimization Using R and portfolio analyticsbuilt scripts which helps treasury product users to optimize their portfolio This POC wassuccessful as team built onto this and created fully fledged app and integrated it intotreasury productCreated an automated Treasury Valuation .By using python,tkinter and sql , createdstandalone desktop app which automates valuation process from error prone manualprocess.Reduced time to do valuation with an easy to use single click tool .Designed and Developed Hive based data warehouse to migrate data warehouse fromoracle .Designed warehouse in Hive ,ingested data using sqoop , used hcatalog and mapreduce jobs to feed data warehouses extract to down stream systemEDUCATIONMaster of Science in Computing Science, Simon Fraser University,Canada Dec 2017PROJECT WORKVancouver Crime Data Visualization tool Data Scientist Live Code 2017An interactive tool to visualize vancouver crime data across last decadeused vancouver crime data to produce visualization tool which used animation toshowcase crime data across vancouver cityUsing R, Shiny ,plotly and leaflet js library to build visualization and animation I was ableto showcase crimes and different statistics over vancouver city map.Showed where crimehhappens often , how crime patterns changes over time in different districts of cityNLP machine Translation System, Data Scientist Code 2016Chinese to English statistical machine translation system with emphasis on feedback exchangebetween decoder and rerankerUsed future cost based beam search decoder and PRO reranking algorithm with ordinalregression to build our decoder-reranker systemUsed python ,numpy and pandas to develop this algorithm from scratch purpose was tobuild statistical machine translation tool to translate chinese text to english text for Msccourse project for NLP , we achieved more than 96% accuracy in train in corpus and 93%accuracy in testing corpus .Skin Lesion Analysis Towards Melanoma Detection using Deep Learning DataScientist 2017Developed image analysis tools to enable automated diagnosis of melanoma from dermoscopicimages and recommend medications from medical textsDeveloped algorithm for automated skin lesion segmentation in form of binary masksusing u-net architectureAlso worked towards meaningful automatic classification of lesion images into melanoma,seborrheic keratosis and nevus using VGG-16Build models using tensorflow ,python achieved 95.6% accuracy in testing data setCombined with the analysis a recommendation engine was built which extracted first linetreatment from pubmed texts for the skin lesionLEADERSHIP + AWARDSStar of Month, Tata Consultancy Services 2011, 2014
Show More