Ashok S.

Ashok S.

Data Engineer/Data Scientist

, India

Experience: 9 Years

Ashok

Data Engineer/Data Scientist

40036.8 USD / Year

  • Notice Period: Days

9 Years

Now you can Instantly Chat with Ashok!

About Me

  • Around 9+ years of experience in the IT industry and Worked in Healthcare, Banking and Insurance domain
  • Hands-on experience in Big data and its ecosystems like Hadoop, Sqoop, Hive HDFS and Real-time processing Fra...
  • Strong Understanding of Kafka architecture and keywords like topics, Producer, Consumers, Broker, Partition Leader and Zookeeper etc.
  • Good understanding of Spark Streaming, transformation and Window Operations etc.
  • Excelled in RDBMS like Oracle/Sybase and basic understanding of NOSQL technologies like Cassandra and Hbase.
  • Hands-On in functional programming languages Scala and Phyton and strong Understanding of higher-order functions and Scala collections like List, Map, Tuple and Arrays etc.
  • DevOps tools like Docker, GitLab, CICD etc
  • Developed Unix Shell/Perl scripts and Scheduled jobs using tools like Autosys.
  • Cloud: AWS and Azure
  • Constantly learning and leveraging emerging technologies.
  • Strong leadership, mentoring and interpersonal skills ; believe in leading by doing

Show More

Portfolio Projects

Description

BioDW(Bio-statistics) : (BioS) data is critical for multiple types of cross-trial analytics, including anywhere

unblinded study results, adjudicated endpoints, and evaluation of other merged and cleaned data are required. The

Bios data was maintained on separate servers by individual Bios teams globally and is obtained via very slow and

time-consuming manual identification and assimilation on an analysis by analysis basis. The purpose of the BioS

data warehouse is to consolidate all Bios data into a single repository under CDR and offer search functionality to

various end-users from different spectrums of the company..

Technologies used : Apache Spark,HDFS,Spark SQL, HUE, Manager,Oracle,Python, Scala.

Responsibilities :

  • Written a Spark Scala script to process SAS7bdat file from source and Pyspark script to read XPT file.
  • Used Streamsets to Run, manage and monitor spark scripts.
  • Used GitLab as a code repository and build a CICD pipeline for continuous Integration and Deployment.
  • Solving the UAT defects raised by the user. Creating Deployment scripts and coordinating with the RLM team for the smooth deployments.
  • Followed all the sprint meetings, Scrums and Retrospectivess etc.

Show More Show Less

Description

Ø Implementing Batch processing of data sources using Apache Spark, Spark SQL, Scala etc and automated the task using Autosys scheduling tool.

Ø DevelopingSparkApplications by usingScala shell commands for earlier stage of development and building the JAR using Intellij Idea,SBT 0.13 and Scala 2.10, Spark 2.2, Hadoop 2.7 and Monitoring Cluster using Cloudera manager.

Ø Creating DataFrame and various DF transformations & Actions for mirroring the existing business logic and validateding the same.

Ø Ensuring optimization using spark features like Persisting, DataFrame, Broadcasting etc and implemented several existing business logic in Spark SQL queries etc.

Ø Analyzed large amounts of data sets writing Spark queries developed Scala scripts, UDF’s using Dataframes/SQL and RDD in Spark 2.2. for DataAggregation, queries.

Ø Importing and exporting data from Relational databases to HDFS and back to the Db using Sqoop and pushing the processed data in downstream system for further processing.

Ø Worked upon different assignments, managing multiple functional projects to understand data usage and implications.

Ø Spearheading a team of Junior developers, resolving issues & making improvements to databases; ensuring all work meets the necessary requirements and Coordinating with the management to prioritize business and information needs.

Show More Show Less

Description

Inquiry Framework: Inquiry Framework is an internal reporting tool used by CITI Bank NA. The tool was developed generic to support data from various sources like Oracle, Hadoop, Sybase etc. in several formats with entitlements. The reports were used to analyse the revenue, assets, Balance sheet etc; based on the different types of accounts.

Technologies used: Hadoop, Hive,HDFS, Freemarker,Groovy.

Responsibilities :

  • Migrating the existing sql queries into HQL as per the client's agenda to migrate the reports on the Hadoop ecosystem
  • Developed the Hive SQL script to implement the business logic on the huge volume of data received from the BI .
  • Import the data received DB into HDFS using sqoop and implement the hive schema on the data imported for data analytics.
  • Solving the UAT and PROD defects raised by the user. Creating Deployment scripts, and coordinating with the RLM team for smooth deployments.
  • Attending daily status call with the client and discussing with the open issue, challenges facing in the work and solution etc.

Show More Show Less

Description

MDM Migration: Master Data Management (IMDM) is a platform that uses enterprise-wide master

data entities based on data sourced from various operational systems. Each master entity represents a single

unified view of a key business object across the enterprise (e.g. Investigator, Research Staff, Study, Study

Site, Health Care Organization. Initially, the SVN was a code repository and Jenkin was a continuous integration and

deployment tool. As per MDM migration moved all the code repositories to Gitlab using the CICD pipeline.

Show More Show Less