Srikanth D.

Srikanth D.

Data Scientist

Nashville , United States

Experience: 7 Years

Srikanth

Nashville , United States

Data Scientist

115200 USD / Year

  • Immediate: Available

7 Years

Now you can Instantly Chat with Srikanth!

About Me

Data science and Machine learning professional, with working knowledge in advanced statistical analysis, manipulating and mining meaningful insights from large datasets. Proven experience in Mathematical, Data Science and Machine Learning concepts. A...

Show More

Portfolio Projects

Description

Scope: Accountable for processing Cigna PPO and Cigna RX data to produce analyses, reports and applications to enhance the development and analysis of healthcare cost data, and trends. Develop data analysis solutions and algorithms to maintain and constantly update Cigna directory of doctors and services.

Responsibilities:

  • Data Acquisition of concerned medical cost data involving Data ingestion from sources like Azure Data-Lake Storage, web scraping of HTML and XML files using Beautiful soup to Python environment for EDA.
  • Tracked patients flow in Greater Los Angeles Area, ranked top 10 critical medical institutions to maximize investment influence for controlling hospital-onset infection, analyzed the relationship between health care policy change and sepsis admission rate, decreased incidence rate of hospital-onset infection by 3.2%.
  • Performed Raw Data Cleaning, Imputation, Wrangling, Feature Engineering techniques for handling Imbalanced data (Random Undersampling and SMOTE Oversampling), Noise reduction, Normalization and Visualization techniques with extensive use of Python libraries NumPy, Pandas, Sklearn, Matplotlib and Seaborn for effective Exploratory data analysis.
  • Implemented rule-based expertise system from the results of exploratory analysis andinformation gathered from the people from different departments.
  • Categorical Variables feature handling using Dummy Variables with One-Hot Encoding and Label Encoding. Replacing Nan values in the dataframes with suitable measures of Central Tendency.
  • Outliers reduction by standard deviation techniques to avoid skewedness, biased inference over the results for enhanced predictions and model accuracy.
  • Curse of Dimensionality handled using PCA, LDA for Linear Dimensionality and Multi-dimensional scaling, tSNE techniques for Non-linearity. Pearson Correlation value, Heatmaps, Chi-Square and ANOVA for Feature selection. Correlation and dimensionality reduction between features and of the standardized data so that maximum variance is preserved along with relevant features.
  • Built a model using NLP concepts like Tokenization, Stemming, Lemmatization, Stop Words, Phrase Matching and libraries like SpaCy and NLTK to find Cosine similarity with TF-IDF technique to analyze text patterns of online assessment data to provide with a Cigna ‘Wellness Score’ based off sci-kit lean and MLlib.
  • Configured EC2 instances and created S3 data pipes using Boto API to load data from internal data sources, deployment of ML models using AWS Sagemaker and Comprehend.
  • Azure Container Service (ACS) with Containerized deployments using Azure Container Instances and Azure Kubernetes Services managing compute capacities by handling ACI and AKS side by side to handle spikes in demand.
  • Building/Maintaining Docker container clusters managed by Kubernetes, Linux, Bash, GIT, Docker, on GCP. Utilized Kubernetes and Docker for the runtime environment of the CI/CD system to build, test deploy.
  • Python3.x/Spark are used to extract data models as Data Profiling to classify the driving performances and predict the parameter values in the Loess algorithm. Bayesian statistics and K-Means are applied to automatically select the parameter values as a Machine Learning method for large data processes. The MLlib and tensor flow library functions are applied to implement the machine learning process.
  • Worked on Bayesian hypothesis test to verify if the insurance prices and discounts offered fit with therisk and reliability scores and researched statistical inference and neural network for the price changes associated with the risk factors using the statistical tools including R, Alteryx and TensorFlow, model tuning by finding the best parameters using GRID search and Bayesian Optimization.
  • Managing Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data,and configuring various database objects like tables, stored procedures, functions and triggers using SQL, PL/SQL to maintain the Cigna directory of doctors and services.
  • Developed various dashboards in Tableau, used context filters, sets while dealing with huge volume of data.
  • Work closely with other analysts, data engineers to develop data infrastructure (data pipelines,reports, dashboards etc.) and other tools to make analytics more effective.
  • Used Agile approaches (CI/CD),including Extreme Programming, Test-Driven Development, and Agile Scrum.
  • Provided knowledge and understanding of current and emerging trends within the analytics industry.
  • Work closely with other analysts, data engineers to develop data infrastructure (data pipelines,reports, dashboards etc.) and other tools to make analytics more effective.

Show More Show Less

Description

Scope: Improving the effectiveness of supply chain operations in Walmart by extensive research through big data and building ML models. Implementing CRISP-DM methodology for implementing the six-phase iterative framework.

Responsibilities:

  • Collaborated with business leaders for defining product requirements, scope, vision, and roadmap for forecasting and provided actionable insights.
  • Implemented an automatic anomaly detection algorithm to detect global and local anomalies on different metrics for millions of items of Walmart fresh food data to effectively save millions of dollars from markdowns.
  • Decreased running time of algorithms from days to hours by implementing dataframe transformations of about 1.8 billion item-store combinations using GPU’s. [Used: Rapids, Dask, Python].
  • For Walmart’s Fresh distribution centers modeled the data using regression analysis to predict the number of items that distribution centers need to work on every day with 90?curacy.
  • Load data into Amazon Redshift and use AWS Cloud Watch to collect and monitor AWS RDS instances within Confidential.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Fine-tuned the model using hyperparameter tuning with GridSearchCV increase the accuracy of the training data from 78% to 83% on validation data using F1-score.
  • Evaluated the performance of the model using ROC curves, AUC, gains charts, confusion matrix and K-fold cross-validation to test the models with different samples of data to optimize the models. Selected logistic regression with the highest AUC score of 0.8 compared to SVM, random forest and Bayesian network.
  • Built, an in-house demand forecasting tool for effective markdowns across 1M+ items using complex machine-learning boosting algorithms, traditional and non -traditional modeling techniques, GPUs, Hadoop clusters, etc. for multiple international markets.
  • Improved model’s performance by feature engineering and model runtime by distributed computing using Pyspark and Hive.
  • Built demand forecasting models to predict the trend and future volumes of import/export products utilizing Time Series ARIMA programming. Built a 3-month demand forecast using ARIMA time-series technique. Performed demand and discount variation analysis across different channels and conducted exploratory data analysis on variables for building model constraints using python.
  • Built framework to show insights, evaluation metrics and track forecast performance while explaining variance.
  • Confidential designs manufacture and markets innovative, high-quality, high-performance motorized products for recreation and utility use to the international market through global distribution channels.
  • Responsible for modeling complex business problems, discovering business insights and identifying opportunities through the use of statistical, algorithmic, data mining, and visualization techniques.
  • Managed contractors and a team who provided GUI design tool and integration with Tesseract OCR engine applied on KYC and business docs, experience in Open CV, Pillow. The final method to build your text detector is using a custom-built text detector model using the TensorFlow Object API.
  • Correspondingly involved in writing REST APIs using Django framework for data exchange and business logic implementation.
  • Performed data visualization and developed presentation material utilizing Tableau.

Responsible for defining key business problems to be solved while developing, maintaining relationships with stakeholders, SMEs, and cross-functional teams

Show More Show Less

Description

Responsibilities:

  • Tackled highly imbalanced Fraud dataset using under-sampling, oversampling with SMOTE and cost-sensitive algorithms with Python Scikit-learn.
  • Wrote complex Spark SQL queries for data analysis, also involving Hive and PySQL to meet business requirement.
  • Developed MapReduce/Spark Python modules for predictive analytics & machine learning in Hadoop on AWS.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, NumPy.
  • Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.
  • Application of various ML algorithms and statistical modeling like decision trees, regression models, random forest, SVM, clustering to identify Volume using different packages in python.
  • Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.
  • Created multiple automated anomaly detection systems to expose outliers and to monitor its performance.
  • Performed Naïve Bayes, KNN, Logistic Regression, Random forest, SVM and XGboost, GLM/Regression to identify whether a loan will default or not.
  • Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss.
  • Used various metrics (RMSE, MAE, F-Score, ROC and AUC) to evaluate the performance of each model.
  • Used big data tools Spark (PySpark, Sparksql, MLLib) to conduct real time analysis of loan default on AWS.
  • Designed and developed computer vision and Deep learning AI object detection/classification (Open CV).
  • Markov Chain analysis to create first touch, Last touch and Heuristic models.
  • Created multiple custom SQL queries in Teradata SQL Workbench to prepare the right data sets for Tableau dashboards. Queries involved retrieving data from multiple tables using various join conditions that enabled to utilize efficiently optimized data extracts for Tableau workbooks.
  • Participated in middle-tier design and development; created, consumed and updated Web Services (including SOAP or REST) on service-oriented architecture (SOA)

Show More Show Less