Harry Z.

Harry Z.

data scientist with 10 years python, spark, SQL, machine learning, advanced statistics experience

New York , United States

Experience: 15 Years

Harry

New York , United States

data scientist with 10 years python, spark, SQL, machine learning, advanced statistics experience

144000 USD / Year

  • Immediate: Available

15 Years

Now you can Instantly Chat with Harry!

About Me

  • Proficiency in using Python, and pyspark for machine learning (Amazon Web Services - AWS, Github, Databricks, TensorFlow, Pytorch, scikit-learn, H2O, Lime, keras, pandas, NLP)
  • Proficiency in using SQL, SAS, python spark with adva...
  • Proficiency in using SQL, SAS, python spark with advanced statistical techniques.
  • Proficient in database driven Excel workbooks, pivot, design, macros, Business Intelligence and VBA.
  • Proficiency in Teradata/DB2/Omniture and Strong UNIX experience. ETL/OLAP experience handling huge data in excess of hundreds of millions of records. Deep natural language processing, Tableau, Spark and Hadoop.
  • Excellent communication skills presenting technical analysis to non-technical personnel with simple and understandable words.
  • Work experience with CVS Health Corporate, Suntrust Bank, Home Depot, Pfizer, and Philip Morris.

Show More

Portfolio Projects

Developed predictive models for E-trade sentiment analysis with Natural Language Processing (NLP)

Company

Developed predictive models for E-trade sentiment analysis with Natural Language Processing (NLP)

Role

Data Scientist

Description

Developed different predictive models to measure E-trade sentiment analysis with NLP (Natural Language Processing) deep learning with GLOVE, wordnet, and punkt, including Keras LSTM (Long short-term memory), CNN (convolution neural network), CNN +GRU (Gated recurrent units), Bidirectional GRU, and Glove word embedding. Also applied spaCy Industrial-Strength NLP with smote for the unbalance data to build models using H2O and Lime. Set callback functions to early stop training and find the best model for the application purposes.

Show More Show Less

Developed Random Forest Classification Model of Prescribing a priority drug for marketing purpose

Company

Developed Random Forest Classification Model of Prescribing a priority drug for marketing purpose

Role

Data Scientist

Description

Data engineering including data validation and features generation by spark sql, mount an AWS S3 bucket through Databricks to pull datasets, physicans' 3-yeras info mainly from laad data, and patients' 10-years info mainly from Optum data. Developed Random Forest model using GridSearchCV to understand drivers of physician prescribing for a priority drug considering Commercial actions, physician as well as patient characteristics and other market forces using various internal and external data sources such as Laad and Optum. Applied Shap values to explain features impacting the drives of physican's prescription.

Target druge: HUMIRA, SIMPONI, TALTZ, SILIQ, ENBREL, TREMFYA, STELARA, ILUMYA, CIMZIA, COSENTYX, OTEZLA, and REMICADE

Show More Show Less

Analyzed data to generate monthly sales reports for different segments with visualization

Company

Analyzed data to generate monthly sales reports for different segments with visualization

Role

Data Scientist

Description

Created dashboards, KPI’s, visualization reports, especially focusing on measuring how the loyalty programs improving the total sales (year by year, month by month) within different customer segmentations; Cohort analysis with different years to build customer insight for the best customer groups; Analysis of variance (ANOVA) by building linear regression to measure the multiple factors related with historical transactional data impacting the total margin.

Show More Show Less

Built a mortgage refinance cash out model with logistic regression for marketing

Company

Built a mortgage refinance cash out model with logistic regression for marketing

Role

Data Scientist

Description

Collected customers data from the company website by different channels such as email, direct mail, social media, and paid_agents, other_ad. Applied a fuzzy logic matching customers' name, address, phone number, or email with existing dadabases to find 5 years customers historical data such as checking or saving account, account balance, account transactions, retirement account, income, liquid assets, and any other loan information. Created and validated datasets by SQL, set yes when customers cashed out meaning the extra money when refinancing their mortgage within 5 years, and set zero when when customers didn't cashed out when refinancing their mortgage within 5 years. Created customers features by business logic. Extract 60?ta by random for building logistic regression model to generate every customer score indicating the cash out probability. 20?ta for validation and another 20?ta for tesing. Ranking scores by customers, and creating decile groups, estimating KS values for evaluating the model performance. KS valuse from three groups data, modeling, validatio, and testing should quite close and all reach over 35%, then the model can be applied for the mortgage campaign. Customers in top 3 decile groups by scoring who are not cash out should be listed for marketing or campaign purposes. 

Show More Show Less