Michael H.

Data Scientist / Machine Learning Engineer

, United States

Experience: 15 Years

Michael

Data Scientist / Machine Learning Engineer

162000 USD / Year

Immediate: Available

15 Years

Now you can Instantly Chat with Michael!

Chat Now

About Me

Hands-on, coding big data architect - produces readable, performant code for accomplishing projects – leader of teams, effective mentor and coach, drives adoption of standards via working examples
Develops algorithms for fraud s...
Develops algorithms for fraud scoring using results discovered from applied machine learning, model construction and back-testing
Designs features combining external data with transactional attributes to yield improved predictive power

Skills

Positions

Portfolio Projects

Description

As data scientist, I designed and delivered a proof-of-concept Consumer Profiler application using credit bureau data. I developed data model using Cassandra as operational data store that achieved split-second lookups for consumer history and validation of consumer identity with table size of 200+ million rows. Ability to assign likelihood that a submitted credit request is fraudulent was demonstrated using this POC. Iwrote additional spark workflows to demonstrate analytic capabilities to extend the solution. Project completed on time and under budget.

Show More Show Less

Description

Lead Architect of reporting solution using Tableau to create reports. I designed all needed intermediate data flows and implemented spark workflows on hadoop to update summary datasets. I used hive tables to present the data for Tableau and implemented the connection. Data aggregates were updated hourly and a dashboard showed number of recordings stored, number of customers and disk utilization for the company's video recording service.

Show More Show Less

Description

I was a developer of data workflows to prepare list of recommended content titles to customers based on their viewing history. I considered several features of the customer to create approximately 200 generic viewer profiles using a clustering algorithm - each customer was assigned a generic profile (utilized scikit-learn module deployed with conda and submitted to cluster using pyspark). I used the generic profile to filter current content offerings and combined features from the individual customer's view history to narrow down the recommended titles. Prepared and scheduled workflow runs on AWS hadoop cluster using pyspark to update customer recommendations.

Show More Show Less

Description

I was the lead developer and architect on this hadoop analytics project using customer viewing data. I created a data model and implemented intermediate and summary tables on using hive. I wrote hive code to perform daily ETL jobs to curate viewership summaries. Key requirements delivered included number of customers viewing by content title, by content type, by viewing hour. Responsible for data availability to the reporting platform (Microstrategy).

Show More Show Less

Description

As a data scientist I worked on developing forensics to pre-identify malicious user behavior using known fraud cases from company's online store. Fraud cases ranged from benign (downloading content for free) to revenue threatening (obtaining and accumulating credit card refunds). Most fraud incidents stemmed from fictitious user accounts. I developed features from transaction dataincluding source ip address, source domain name, id name, id age, etc, and created models to predict the likelihood the user account was fraudulent. Models were deployed to run in real time and make quick classifications. I had to update the model to control false positive rate and minimize the occurrence of legitimate users being locked out of the online store.

Show More Show Less

Description

I am lead data scientist in pursuit of a trading algorithm. I write tensors to implement RNN and learn buy/sell signals from market data. Utilize market price data for securities along with trading volume and extant data sources as inputs to RNN. Keras proved to be a bit too high level for our endeavors so using Tensorflow direct has been the focus - also evaluating pytorch for implementation considerations. This project continues to draw interest from investors and fellow data scientists - the models derived are being tested side-by-side in "paper trade" mode. Thus far, we havn't found a model rich enough in signal to commit capital to but the pursuit yet holds promise!

Show More Show Less