Srikanth D.

Srikanth D.

Data science and Machine learning professional

, United States

Experience: 7 Years

Srikanth

Data science and Machine learning professional

115200 USD / Year

  • Immediate: Available

7 Years

Now you can Instantly Chat with Srikanth!

About Me

Data science and Machine learning professional, with working knowledge in advanced statistical analysis, manipulating and mining meaningful insights from large datasets. Proven experience in Mathematical, Data Science and Machine Learning conc...

 

PROFESSIONAL SUMMARY:

  • Over7years of experience in all phases of diverse technology projects specializing in HealthCare, Technology, Retail&Supplychain, E-commerce domains involving Core Data Science, high-data-volume training and Machine Learning.
  • Proven expertise in various SDLC stages (Software Development life cycle analysis, Requirements gathering, Designing) with expertise in writing/documenting Technical Design Document (TDD), Functional Specification Document (FSD), Test Plans, GAP Analysis in E2E data processing pipeline, Source to Target mapping documents.
  • Proficient in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scale across a massive volume of structured and unstructured data.
  • Constant measures to protect data integrity and accuracy, Experience in performing Root Cause Analysis of issues that hinder data quality, work with data source owner to consistently improve accuracy of source data
  • Sound understanding and research of current process and emerging technologies which need analytic models, data inputs and output, analytic metrics and user interface needs.
  • Strong experience as Data scientist, Machine Learning Engineer and AI developer with experience on manipulating and mining meaningful data from large sets of structured, semi - structured and unstructured data.
  • Great knowledge of mathematical, data science and machine learning concepts. Able to formulate a solution strategy to data science problems, apply exploratory analysis to identify abnormalities in data, and utilize the appropriate set of algorithms. (regression, SVM, decision tree, clustering and deep learning).
  • Proven expertise in employing techniques for Data Mining, Information Retrieval, Supervised and Unsupervised (Clustering, Classification, PCA, Decision trees, KNN, SVM) learning, Predictive Analytics, Optimization Methods and Natural Language Processing (NLP), Time Series Analysis, Deep Learning.
  • Experienced with Machine Learning Regression Algorithms like Simple, Multiple, Polynomial, SVR (Support Vector Regression), Decision Tree Regression, Random Forest Regression.
  • Experienced with Machine Learning Classification Algorithms like Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree & Random Forest classification coupled with Ensemble learning methods like Bagging, Boosting & Random forests.
  • Equipped with experience in utilizing statistical techniques which include Correlation, Hypotheses modeling, Inferential Statistics as well as data mining and modeling techniques using Linear and Logistic regression, clustering, decision trees, and k-mean clustering.
  • Expertise in using Linear & Logistic Regression and Classification Modeling, Decision-trees, Principal Component Analysis (PCA), Cluster and Segmentation analyses.
  • Hands on experience of Data Science libraries in Python such as Pandas, NumPy, SciPy, Scikit-learn, BeautifulSoup, NLTK,NLP,Theano, Pytorch, Keras, RNN, CNN for Object Identifying and Tensorflow.
  • Solid understanding of installing, configuring, and usage of AWS (Amazon Web Services) S3, EC2, RDS, Kinesis, Sagemaker. Apache Spark, Pyspark, Spark MLlib for pipelining, processing, and model deployment on cloud.
  • Good Command of web services with protocols SOAP, REST, Experience in writing REST APIs in Python for large-scale applications
  • Experience working on Data visualization tools and creating interactive dashboards (Tableau, QlikView, Python Seaborn, and Matplotlib).
  • Strong background and hands-on experience of Microsoft SQL Server, PostgreSQL used complex SQL queries for data integration, manipulation techniques to implement approaches to performance and improvement.
  • Knowledge in Text mining, Topic Modelling, Sentiment Analysis, Recommendation systems, Named-entity recognition, and Hidden Markov Models.
  • Agile, Communicative, Performance-focused, and goal-oriented professional, offering dynamic daily duties and tasks with the ability to support multiple teams, cross channels and deliver data and business solutions.

Show More

Skills

Software Testing

Mobile Apps

Networking & Security

Graphic Design

Portfolio Projects

Company

Cigna

Role

Data Scientist

Description

Scope: Accountable for processing Cigna PPO and Cigna RX data to produce analyses, reports and applications to enhance the development and analysis of healthcare cost data, and trends. Develop data analysis solutions and algorithms to maintain and constantly update Cigna directory of doctors and services.

Responsibilities:

 

  • Data Acquisition of concerned medical cost data involving Data ingestion from sources like Azure Data-Lake Storage, web scraping of HTML and XML files using Beautiful soup to Python environment for EDA.
  • Tracked patients flow in Greater Los Angeles Area, ranked top 10 critical medical institutions to maximize investment influence for controlling hospital-onset infection, analyzed the relationship between health care policy change and sepsis admission rate, decreased incidence rate of hospital-onset infection by 3.2%.
  • Performed Raw Data Cleaning, Imputation, Wrangling, Feature Engineering techniques for handling Imbalanced data (Random Undersampling and SMOTE Oversampling), Noise reduction, Normalization and Visualization techniques with extensive use of Python libraries NumPy, Pandas, Sklearn, Matplotlib and Seaborn for effective Exploratory data analysis.
  • Implemented rule-based expertise system from the results of exploratory analysis andinformation gathered from the people from different departments.
  • Categorical Variables feature handling using Dummy Variables with One-Hot Encoding and Label Encoding. Replacing Nan values in the dataframes with suitable measures of Central Tendency.
  • Outliers reduction by standard deviation techniques to avoid skewedness, biased inference over the results for enhanced predictions and model accuracy.
  • Curse of Dimensionality handled using PCA, LDA for Linear Dimensionality and Multi-dimensional scaling, tSNE techniques for Non-linearity. Pearson Correlation value, Heatmaps, Chi-Square and ANOVA for Feature selection. Correlation and dimensionality reduction between features and of the standardized data so that maximum variance is preserved along with relevant features.
  • Built a model using NLP concepts like Tokenization, Stemming, Lemmatization, Stop Words, Phrase Matching and libraries like SpaCy and NLTK to find Cosine similarity with TF-IDF technique to analyze text patterns of online assessment data to provide with a Cigna ‘Wellness Score’ based off sci-kit lean and MLlib.
  • Configured EC2 instances and created S3 data pipes using Boto API to load data from internal data sources, deployment of ML models using AWS Sagemaker and Comprehend.
  • Azure Container Service (ACS) with Containerized deployments using Azure Container Instances and Azure Kubernetes Services managing compute capacities by handling ACI and AKS side by side to handle spikes in demand.
  • Building/Maintaining Docker container clusters managed by Kubernetes, Linux, Bash, GIT, Docker, on GCP. Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
  • Python3.x/Spark are used to extract data models as Data Profiling to classify the driving performances and predict the parameter values in the Loess algorithm. Bayesian statistics and K-Means are applied to automatically select the parameter values as a Machine Learning method for large data processes. The MLlib and tensor flow library functions are applied to implement the machine learning process.
  • Worked on Bayesian hypothesis test to verify if the insurance prices and discounts offered fit with therisk and reliability scores and researched statistical inference and neural network for the price changes associated with the risk factors using the statistical tools including R, Alteryx and TensorFlow,  model tuning by finding the best parameters using GRID search and Bayesian Optimization.
  • Managing Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data,and configuring various database objects like tables, stored procedures, functions and triggers using SQL, PL/SQL to maintain the Cigna directory of doctors and services.
  • Developed various dashboards in Tableau, used context filters, sets while dealing with huge volume of data.
  • Work closely with other analysts, data engineers to develop data infrastructure (data pipelines,reports, dashboards etc.) and other tools to make analytics more effective.
  • Used Agile approaches (CI/CD),including Extreme Programming, Test-Driven Development, and Agile Scrum.
  • Provided knowledge and understanding of current and emerging trends within the analytics industry.
  • Work closely with other analysts, data engineers to develop data infrastructure (data pipelines,reports, dashboards etc.) and other tools to make analytics more effective.

Show More Show Less

Tools

Git

Company

Walmart

Role

Data Scientist

Description

Scope: Improving the effectiveness of supply chain operations in Walmart by extensive research through big data and building ML models. Implementing CRISP-DM methodology for implementing the six-phase iterative framework.

Responsibilities:

  • Collaborated with business leaders for defining product requirements, scope, vision, and roadmap for forecasting and provided actionable insights.
  • Implemented an automatic anomaly detection algorithm to detect global and local anomalies on different metrics for millions of items of Walmart fresh food data to effectively save millions of dollars from markdowns.
  • Decreased running time of algorithms from days to hours by implementing dataframe transformations of about 1.8 billion item-store combinations using GPU’s. [Used: Rapids, Dask, Python].
  • For Walmart’s Fresh distribution centers modeled the data using regression analysis to predict the number of items that distribution centers need to work on every day with 90?curacy.
  • Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Fine-tuned the model using hyperparameter tuning with GridSearchCV increase the accuracy of the training data from 78% to 83% on validation data using F1-score.
  • Evaluated the performance of the model using ROC curves, AUC, gains charts, confusion matrix and K-fold cross-validation to test the models with different samples of data to optimize the models. Selected logistic regression with the highest AUC score of 0.8 compared to SVM, random forest and Bayesian network.
  • Built, an in-house demand forecasting tool for effective markdowns across 1M+ items using complex machine-learning boosting algorithms, traditional and non -traditional modeling techniques, GPUs, Hadoop clusters, etc. for multiple international markets.
  • Improved model’s performance by feature engineering and model runtime by distributed computing using Pyspark and Hive.
  • Built demand forecasting models to predict the trend and future volumes of import/export products utilizing Time Series ARIMA programming. Built a 3-month demand forecast using ARIMA time-series technique. Performed demand and discount variation analysis across different channels and conducted exploratory data analysis on variables for building model constraints using python.
  • Built framework to show insights, evaluation metrics and track forecast performance while explaining variance.
  • Confidential designs manufacture and markets innovative, high-quality, high-performance motorized products for recreation and utility use to the international market through global distribution channels.
  • Responsible for modeling complex business problems, discovering business insights and identifying opportunities through the use of statistical, algorithmic, data mining, and visualization techniques.
  • Managed contractors and a team who provided GUI design tool and integration with Tesseract OCR engine applied on KYC and business docs, experience in Open CV, Pillow. The final method to build your text detector is using a custom-built text detector model using the TensorFlow Object API.
  • Correspondingly involved in writing REST APIs using Django framework for data exchange and business logic implementation.
  • Performed data visualization and developed presentation material utilizing Tableau.

Responsible for defining key business problems to be solved while developing, maintaining relationships with stakeholders, SMEs, and cross-functional teams

Show More Show Less

Tools

OpenCV

Knoah Solutions

https://www.knoah.com

Company

Knoah Solutions

Role

Data Scientist

Description

Responsibilities:

  • Tackled highly imbalanced Fraud dataset using under-sampling, oversampling with SMOTE and cost-sensitive algorithms with Python Scikit-learn.
  • Wrote complex Spark SQL queries for data analysis, also involving Hive and PySQL to meet business requirement.
  • Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop on AWS.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Application of various ML algorithms and statistical modeling like decision trees, regression models, random forest, SVM, clustering to identify Volume using different packages in python.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Created multiple automated anomaly detection systems to expose outliers and to monitor its performance.
  • Performed Naïve Bayes, KNN, Logistic Regression, Random forest, SVM and XGboost, GLM/Regression to identify whether a loan will default or not.
  • Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss.
  • Used various metrics (RMSE, MAE, F-Score, ROC and AUC) to evaluate the performance of each model.
  • Used big data tools Spark (PySpark, Sparksql, MLLib) to conduct real time analysis of loan default on AWS.
  • Designed and developed computer vision and Deep learning AI object detection/classification (Open CV).
  • Markov Chain analysis to create first touch, Last touch and Heuristic models.
  • Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards. Queries involved retrieving data from multiple tables using various join conditions that enabled to utilize efficiently optimized data extracts for Tableau workbooks.
  • Participated in middle-tier design and development; created, consumed and updated Web Services (including SOAP or REST) on service-oriented architecture (SOA)

Show More Show Less

Tools

AWS