Sayan M.

Sayan M.

ML Architect

Kolkata , India

Experience: 13 Years

Sayan

Kolkata , India

ML Architect

48110.7 USD / Year

  • Immediate: Available

13 Years

Now you can Instantly Chat with Sayan!

About Me

A quant and a poet, a multifaceted technology career with a 15-year track record of innovation and success in the fields of machine learning, big data, cloud computing and data-driven product or solution development extending the horizon towar...

Show More

Skills

Portfolio Projects

Description

Technology: Python, AWS, Redish, Name Entity Recognation, Stanford NLP, NLTK, Mongo, Java, Scala

Responsibilities: built the product from scratch from REST API to ML model

Show More Show Less

Description

Designation: Chief Software Architect – Java, Scala

Duration: Jul 2018 – Aug 2018

Synopsis: Attended two paid boot-camps for Chief Software Architect, one for one week then another for one month

Show More Show Less

Description

Future Today Recommender is updatable, i.e., in each training iteration, it will update the stored model not replace it. It is possible because model is a function of the count. Model is exposed as a Rest API to integrate with other systems. To make the REST API response fast, we stored all information required for the recommendation in shared data structured in RAM. To respond the request API does not need to fire any query in other than RAM data structure. We also implemented parallel execution of each query in one response. The training process is scalable with the size of input raw log data as it is implemented in Spark. Training system can also handle input data size more than it’s RAM size. It implemented for the system which has 40 million user views per month and it response below 150ms for prediction request.

Show More Show Less

Description

Future Today Recommender is updatable, i.e., in each training iteration, it will update the stored model not replace it. It is possible because model is a function of the count. Model is exposed as a Rest API to integrate with other systems. To make the REST API response fast, we stored all information required for the recommendation in shared data structured in RAM. To respond the request API does not need to fire any query in other than RAM data structure. We also implemented parallel execution of each query in one response. The training process is scalable with the size of input raw log data as it is implemented in Spark. Training system can also handle input data size more than it’s RAM size. It implemented for the system which has 40 million user views per month and it response below 150ms for prediction request.

Show More Show Less

Description

Future Today Recommender is updatable, i.e., in each training iteration, it will update the stored model not replace it. It is possible because model is a function of the count. Model is exposed as a Rest API to integrate with other systems. To make the REST API response fast, we stored all information required for the recommendation in shared data structured in RAM. To respond the request API does not need to fire any query in other than RAM data structure. We also implemented parallel execution of each query in one response. The training process is scalable with the size of input raw log data as it is implemented in Spark. Training system can also handle input data size more than it’s RAM size. It implemented for the system which has 40 million user views per month and it response below 150ms for prediction request.

Show More Show Less

Description

Java, Hadoop, MapReduce, Perl, MySql, Least Square Method, Correlation Analysis, Scala

Responsibilities: Engaged in Requirement Analysis, Application Architecture Design, Database Schema Design, Unit Testing, Integrated Testing, Planning for Release Strategy, POC

Show More Show Less

Description

Developed a virtual order management system and simulated to a high transaction environment. Instrumented by ARM, MY-ARM and Kronos API and compared performance characteristics, such as accuracy, instrumentation overload in execution time, memory and processor usage. Parsed log file in trading time, then designed and implemented a distributed, divide and conquer architecture for Real-time Log-file Parsing Framework. Data of potential interest is transmitted to a central server through a robust and highly tuneable transport layer. Completed heavy weight parsing in the central server separated to the production host. Divided order latency into two phases (inside the box / outside the box) to enable further analysis; inside the box latency measured by Kernel program / Outside the box latency collected by Corvil API using WSDL.

Responsibilities:

Design, Development, Performance Testing and Production Release of Socat base shell script, C++ base log Development.

Show More Show Less

Description

Project to develop a Trader’s Infrastructure Application Network Monitoring Tool; integrating network, server, application alerts in a common dash board for all devices under a particular prop trader. Front end is a grid view of WPF application. Back ends are different adapters those run in parallel threads. For network alerts we use SevOne API through WSDL client. For server alerts we use Ganglia API through telnet client. For application alerts it gets data from web server which stores the application alert through Log-file parsing project. When a device name is entitled in dash board connected device names also come through firing query to configuration database. It is an in-house enterprise product of Credit-Suisse

Responsibilities:

Design, Develop, Unit Testing, Profiling Data Adapter parts (Ganglia, SevOne, Application alerts), Front End Development a explorer view bind to a database through Xml Document

Show More Show Less

Description

The system has three major components – collector, real-time trainer, and predictor. It listens to the programmatic ad selling information for each impression through a collector and sends the predicted floor price for each impression to ad server through predictor. Real-time trainer builds the model on recently collected data and saves it, and predictor uses that model for prediction. Sulvo ad server is hosted in AWS cloud, and real-time trainer and predictor are hosted in Goggle cloud platform. The collector which is hosted in AWS and exposes a REST API to collect data receives information from the ad server and push it to a Redis message queue. On de-queue message from Redis server; the message is pushed to a Google Big Query instance. Real-time trainer which is hosted in a Google Compute Engine fetches the latest information from Big Query and builds a multi-layer CNN model using tensor-flow and save the model in Google Data Store. Predictor server which is a Falcon-based REST API is hosted on Google App Engine, receives prediction requests from Sulvo ad server and response back floor value calculated on basis stored data in the Data Store. The system is deep learning based, able to train the model with real-time data and also high performance, assured to send a response to ad server within 300ms and auto-scaled to serve 22 million prediction daily.

Show More Show Less

Description

Details:

Project for a real estate client to automate manual reporting processes detailing data for High-Net-Worth clients. Developed a two step approach; gathered the URL with relevant information via a Tropical Crawling approach as well as extracting name, organisation and location from output URL via a name entity recognition algorithm. In terms of Topical Crawling, two sets of URL's were used for input, the first being seed URL's where crawling will begin and the second target URL's as reference URL's and a known subset of relevant URL's. Utilised Jaccard Distance as a similarity measure to compare content of new URL's with targets and select the most similar. This step was repeated to make the selected URL as the seed URL, which also considered all output URL's in previous steps and excluded those already selected. Reductions in search space were achieved by establishing a threshold in the similarity measure URL, where an average similarity is among the target URL's. A set of keywords was created, i.e. 'profile', 'biography', for the relevant URL string. Using a Name Entity Recognisation algorithm enabled the extraction of name, organisation and location in the selected output URL contents. Beautifulsoup library was used to parse the html pages and Stanford NER NLTK library for Name Entity recognition.

Responsibilities:

Problem Formulation, Design Solution Architecture, Mentor Implementation, Client Liaison

Show More Show Less

Description

Project to recommend the best vendors for a particular product ID using purchase order and invoice information. To address the problem this was divided into two sub-areas. The first area involved the prediction of probability of successful purchases. Three approaches were used in this calculation; a regression-based approach using Logistic Regression, a rule-based approach using Random Forest Algorithm and a Bayes-theory approach by Naive based classifier. The second phase involved grouping of data by VEN_ID and calculation of average probability of a successful purchase for each vendor. Vendors were then sorted according to this probability. Rule-based engines were then applied, such as vendor with minimum delivery time or vendor with maximum number of deliveries, and the most suitable vendor was recommended. Rules were prioritised by predicting aspects of user query using classification

Responsibilities:

Problem Formulation, Design Solution Architecture, Mentor Implementation, Client Liaison

Show More Show Less

Description

Application text classification algorithm in brand reputation and promotion management, with input from various textual data in web, crm, mail server through crawler script or third party api and store data in HDFS. Classified sentences according to degree of sentiment (Positive / Negative / Neutral) as well as by brand dimension (Food / Staff / Entertainment); providing trend summaries / KPI information across individual brands to Data Brand Owners / Market Analysts. Selected naive bayes algorithm as text classifier following a literature survey, implemented via Mahout and visualised big data via Kibana; keeping the data in elastic search.

Responsibilities:

Requirement Analysis, Application Architecture Design, Database Schema Design, Unit Testing, Integrated Testing, Planning for Release Strategy, POC.

Show More Show Less

Description

Generic framework for estimation and forecasting of quantitative features from a huge historical data set; 100M orders per day and project to support 3 months of historical data. Proprietary implementation of Machine Learning methods enabled realisation of quantitative target features (No. of Impression, ECPM) as a linear regression of set of base features (Geo, Site, AdSize, Frequency). Applied linear regression to implement the Least Square method. Overall average accuracy improved by 60%+ and selected premium sites in smaller data sizes gave 80?curacy

Show More Show Less

Description

Application Support project to modify and enhance an in-house position management, portfolio valuation, risk summary product. The application pulls market data through Reuters API and trade information from Fidesa, calculates various portfolio valuations and risk analysis parameters, then visualizes information to Traders

Responsibilities:

Front / Back-end Design, Development, Unit Testing & Profiling, Beta Calculation, Average Volume Calculation, Industry Wise Index Decomposition, Exception Management, Risk Analysis Group Member (Imagine) of CSFB Prop-IT.

Show More Show Less