Now you can Instantly Chat with Mayank!
About Me
Energetic AI/ML/JAVA/Python/R developer with 4.2 years of experience in developing robust code for high-volume business. Understand the business problem, identify the key challenges, and formulate the machine learning problem and prototype solutions....
Show MoreSkills
Positions
Portfolio Projects
Description
- The original goal for log classification was to develop an automated means of notifying users when problems occur with their applications based on the information contained in their application logs. Unfortunately logs are full of messages that contain warnings or even errors that are safe to ignore, so simple “find-keyword” methods are insufficient. In addition, the numbers of logs are increasing constantly and no human will, or can, monitor them all. In short, log classification was to employ natural language processing tools for text encoding and machine learning methods for automated anomaly detection, in an effort to construct a tool that could help developers perform root cause analysis more quickly on failing applications by highlighting the logs most likely to provide insight into the problem or to generate an alert if an application starts to produce a high frequency of anomalous logs.
Description
NLP search technology is much more than keyword lookups from a dictionary. It’s a real-time parser that examines the search query to understand meaning, intent and context. In seconds, it then produces highly-efficient queries, accurate results, and powerful visualizations.
- Natural language processing engine enables plain-English search.
- Automatically generates highly-optimized query.
- Intuitive search interface and powerful search suggestions.
- Creates multiple reports and visualizations from a single search.
- Rich search results are returned in real-time.
- Enables correlations across multiple data sources.
Description
- This incremental clustering is designed using the cluster’s metadata captured from the K-Means results. Incremental clustering outperformed when the number of clusters increased, number of objects increased, the length of the cluster radius decreased, while the incremental clustering outperformed when the number of new data objects are inserted into the existing database. In incremental approach, the K-means clustering algorithm is applied to a dynamic database where the data may be frequently updated. And this approach measure the new cluster centers by directly computes the new data from the means of the existing clusters instead of rerunning the K-means algorithm. Thus it describes, at what percent of delta change in the original database up to which incremental K-means clustering behaves better than actual K-means.
Description
- Document clustering analyses written language in unstructured text to place documents into topically related groups or clusters. Documents such as web pages are automatically grouped together so that pages talking about the same concepts are in the same cluster and those talking about different concepts are in different clusters. This is performed in an unsupervised manner where there is no manual labeling of the documents for these concepts, topics or other semantic information. All semantic information is derived from the documents themselves. The core concept that allows this to happen is the definition of a similarity between two documents. An algorithm uses this similarity measure and optimizes it so that the most similar documents are placed together.
- The K-tree algorithm uses the k-means algorithm to perform splits in its tree structure.
Description
- Anomaly detection is an algorithmic feature that identifies when a metric is behaving differently than it has in the past, taking into account trends, seasonal day-of-week, and time-of-day patterns. It is well-suited for metrics with strong trends and recurring patterns that are hard to monitor with threshold-based alerting.
Description
- Outlier detection is an algorithmic feature that allows you to detect when a specific group is behaving different compared to its peers. For example, you could detect that one web server in a pool is processing an unusual number of requests, or significantly more 500 errors are happening in one AWS availability zone than the others.
Description
- Forecasting is an algorithmic feature that allows you to predict where a metric is heading in the future. It is well-suited for metrics with strong trends or recurring patterns. For example, if your application starts logging at a faster rate, forecasts can alert you a week before a disk fills up, giving you adequate time to update your log rotation policy. Or, you can forecast business metrics, such as user sign-ups, to track progress against your quarterly targets
Description
- Responsible for working on a range of projects, designing appealing websites
and interacting on a daily basis with graphic designers, back-end developers.
- Developing and maintaining the front end functionality of websites.
- Participating in discussions with clients to clarify what they want.
- Simultaneously managing several databases and reporting tools.
- Contacting external webmasters to confirm link placements.
- Handling Java development including design & troubleshooting of applications, conducting gap analysis including validation of needs in conjunction with onsite & offsite teams
- Improving data processing and storage throughput by using Hadoop framework for distributed computing across a cluster of up to twenty-five nodes.
- Building customized memory indexes for high performance information retrieval using Apache Lucene and Apache Solr, as well as an optimized Graph Database with up to 10Billion edges.
- Applying machine learning algorithms in order to identify the most significant features across different datasets.
- Creating Proof of Concepts from scratch illustrating how these data integration techniques can meet specific business requirements reducing cost and time to market.
- Primarily used Scala to write cloud computing applications.
- Worked with cutting edge cloud technology using Heroku and Hadoop.
- Also Utilized Java, Scala and Python for cloud engineering.
- Configured web servers (IIS, nginx) to enable caching, CDN application servers, and load balancers.
- Deployed and supported Memcache-AWS ElasticCache.
- Involved in maintenance and performance of Amazon EC2 instances.
- Diagnose issues with Java applications running in Tomcat or JBoss.
- Involved in designing and developing Amazon EC2, Amazon S3, Amazon SimpleDB, Amazon RDS, Amazon Elastic Load Balancing, Amazon SQS, and other services of the AWS infrastructure.
- AWS data backup (snapshot, AMI creation) techniques, along with data-at-rest security within AWS.
- Developed Python based API (RESTful Web Service) CRM system using Flask, SQLAlchemy and PostgreSQL.
- Translation of designer mock-ups and wireframes into an AngularJS front-end
- Knowledge of Node.js and frameworks available for it (such as Express, StrongLoop, etc depending on your technology stack).
- Good understanding of server-side templating languages (such as Jade, EJS, etc depending on your technology stack).
- Implemented GRPC to connect java and python for transfering data.
- Implemented Vertx to connect Java and R for transfering data.
- Having Knowledge of Network protocols like TCP/IP, UDP.
IT Skills
- Java Framework: Spring, Spring Booting, Hibernate, Play, groovy and grails, Apache Ant, EJB, Jasper Report, Java FX, Servlet, JSP, .
- Python: Django, Flask, Falcon, Pyramid.
- BigData Analysis: Hadoop, Apache spark, heroku, Hbase, Cassandra, Hive, High Charts, R programming, SQOOP, Zookeeper.
- Cloud Computing: AWS
- Database: Oracle, MySql, postgresql, MongoDB., SQLite, Memcached, MariaDB, H2.
- Scala Framework: Play
- Ruby and Rails
- Docker
- Machine Learning: Python, R Programming, Matlab
- Natural Language Processing: NLTK, OPENNLP
- Artificial Intelligence: Tensorflow, Pytorch, Deeplearning4j
Description
Log Classification:The original goal for log classification was to develop an automated means of notifying userswhen problems occur with their applications based on the information contained in theirapplication logs. Unfortunately logs are full of messages that contain warnings or even errorsthat are safe to ignore, so simple find-keyword methods are insufficient. In addition, thenumbers of logs are increasing constantly and no human will, or can, monitor them all. In short,log classification was to employ natural language processing tools for text encoding andmachine learning methods for automated anomaly detection, in an effort to construct a tool thatcould help developers perform root cause analysis more quickly on failing applications byhighlighting the logs most likely to provide insight into the problem or to generate an alert if anapplication starts to produce a high frequency of anomalous logs.Log Reduce:Log Reduce groups messages with similar structures and common repeated text strings intosignatures, providing a quick investigative view, or snapshot, for the keywords or time rangeprovided.AI Based Alert Correlation:Automated root-cause analysis without thresholds or baselines. Instead of relying on events andthresholds, suspicious metric behavior is detected by analyzing the value distribution of metrics.If the current metric measurement distribution deviates significantly from the observed historicmetric behavior, the respective component is marked as unhealthy, even if no threshold hasbeen reached.NLP Based Search Engine:NLP search technology is much more than keyword lookups from a dictionary. Its a real-time parserthat examines the search query to understand meaning, intent and context. In seconds, it thenproduces highly-efficient queries, accurate results, and powerful visualizations.Natural language processing engine enables plain-English search.Automatically generates highly-optimized query.Intuitive search interface and powerful search suggestions.Creates multiple reports and visualizations from a single search.Rich search results are returned in real-time.Enables correlations across multiple data sources.Incremental kmeans:This incremental clustering is designed using the clusters metadata captured from the K-Meansresults. Incremental clustering outperformed when the number of clusters increased, number ofobjects increased, the length of the cluster radius decreased, while the incremental clusteringoutperformed when the number of new data objects are inserted into the existing database. Inincremental approach, the K-means clustering algorithm is applied to a dynamic database wherethe data may be frequently updated. And this approach measure the new cluster centers bydirectly computes the new data from the means of the existing clusters instead of rerunning theK-means algorithm. Thus it describes, at what percent of delta change in the original databaseup to which incremental K-means clustering behaves better than actual K-means.Incremental Cluster Updating Using Gaussian Mixture Model:The proposed incremental approach preserves comprehensive statistical information of theclusters in the form of GMM. As each GMM needs the number of Gaussian (component) as aninput parameter, we proposed a method to determine the number of components automaticallyby introducing the concept of core points. In the updating phase, instead of processing eachnew sample individually, we collect the new incoming samples and cluster them. By employingthe concepts of core points and GMMs, we build a number of GMMs for the new samples and welabel the new GMMs based on their similarity to the already existing GMMs.Fuzzy Clustering:Fuzzy clustering (also referred to as soft clustering or soft k-means) is a form of clustering inwhich each data point can belong to more than one cluster.The unsupervised k-means clustering algorithm gives the values of any point lying in someparticular cluster to be either 0 or 1 i.e., either true or false. But the fuzzy logic gives the fuzzyvalues of any particular data point to be lying in either of the clusters. Here, in fuzzy c-meansclustering, we find out the centroid of the data points and then calculate the distance of eachdata point from the given centroids until the clusters formed becomes constant.Ktree Clustering:Document clustering analyses written language in unstructured text to place documents intotopically related groups or clusters. Documents such as web pages are automatically groupedtogether so that pages talking about the same concepts are in the same cluster and thosetalking about different concepts are in different clusters. This is performed in an unsupervisedmanner where there is no manual labeling of the documents for these concepts, topics or othersemantic information. All semantic information is derived from the documents themselves. Thecore concept that allows this to happen is the definition of a similarity between two documents.An algorithm uses this similarity measure and optimizes it so that the most similar documentsare placed together.The K-tree algorithm uses the k-means algorithm to perform splits in its tree structure.Anomaly Detection:Anomaly detection is an algorithmic feature that identifies when a metric is behaving differentlythan it has in the past, taking into account trends, seasonal day-of-week, and time-of-daypatterns. It is well-suited for metrics with strong trends and recurring patterns that are hard tomonitor with threshold-based alerting.Outlier Detection:Outlier detection is an algorithmic feature that allows you to detect when a specific group isbehaving different compared to its peers. For example, you could detect that one web server ina pool is processing an unusual number of requests, or significantly more 500 errors arehappening in one AWS availability zone than the others.Forecasting:Forecasting is an algorithmic feature that allows you to predict where a metric is heading in thefuture. It is well-suited for metrics with strong trends or recurring patterns. For example, if yourapplication starts logging at a faster rate, forecasts can alert you a week before a disk fills up,giving you adequate time to update your log rotation policy. Or, you can forecast businessmetrics, such as user sign-ups, to track progress against your quarterly targetsAuto Smoothing:Some metrics, however, are inherently so noisy that the graphs become unreadable (thedreaded spaghettification problem), and you lose the ability to extract essential informationabout trends and large-scale deviations.Auto Smoother automatically removes the noise of a timeseries while preserving its shape.Auto threshold (Baseline):Machine learning algorithms automatically understand statistical characteristics of responsetimes, failure rates, and throughput.Log Parsing:Log parsing is use to extract the important field from the logs.Plugins:Implemented HipChat plugin in Java.Implemented FTP plugin in Java.Implemented Domain Certificate plugin in Java.Implemented Azure Monitoring plugin in Java.Implemented AWS Monitoring for EC2 and S3 plugin in Java.Created topology which shows all the connected network devices and the status of all thedevices.Implemented flow diagram which shows over all flow of traffic from source to destination.
Show More Show Less