Now you can Instantly Chat with Sana!
- Languages: R, Python, Scala.
- Tools: SciPy, Spark, Pandas, Apache HBase, Hive, Hdfs, Hadoop MapReduce, SAS, Machine Learning, Tableau.
- Test Management: HP ALM.
- Test Management: HP ALM.
- Testing Concepts: STLC, Testing Levels, Testing Types.
- Database: SQL, Excel.
- Others: Life & Annuity Product Testing, Devops & Agile Familiarity, JIRA.
- 2.8 yrs of experience as a Programmer Analyst from Cognizant Technology Solutions.
- 1. 5 yrs oh experience as a Data Scientist.
Data & Analytics
California Housing Price Prediction
The purpose of the project is to predict median house values in Californian districts, given many features from these districts. The project also aims at building a model of housing prices in California using the California census data. The data has metrics such as the population, median income, median housing price, and so on for each block group in California. This model should learn from the data and be able to predict the median housing price in any district, given all the other metrics. Districts or block groups are the smallest geographical units for which the US Census Bureau publishes sample data (a block group typically has a population of 600 to 3,000 people). There are 20,640 districts in the project dataset. Bonus Exercise: Predict housing prices based on median_income and plot the regression chart.Show More Show Less
NYC 311 Service Request Analysis
Perform a service request data analysis of New York City 311 calls. You will focus on the data wrangling techniques to understand the pattern in the data and also visualize the major complaint types.
- Import a 311 NYC service request
- Basic data exploratory analysis
o Explore data.
o Find patterns.
o Display the complaint type and city together.
- Find major complaint types
o Find the top 10 complaint types
o Plot a bar graph of count vs. complaint types
- Visualize the complaint types
- Display the major complaint types and their count
The web analytics team of www.datadb.com is interested to understand the
web activities of the site, which are the sources used to access the website.
They have a database that states the keywords of time in page, source group,
bounces, exits, unique page views, and visits. The team is targeting at the following issues:
- The team wants to analyse each variable of the data collected through data
summarization to get a basic understanding of the dataset and to prepare
for further analysis.
- As mentioned earlier, a unique page view represents the number of
sessions during which that page was viewed one or more times. A visit
counts all instances, no matter how many times the same visitor may have
been to your site. So the team needs to know whether the unique page
view value depends on visits.
- Find out the probable factors from the dataset, which could affect the exits.
Exit Page Analysis is usually required to get an idea about why a user leaves
the website for a session and moves on to another one. Please keep in
mind that exits should not be confused with bounces.
- Every site wants to increase the time on page for a visitor. This increases
the chances of the visitor understanding the site content better and hence
there are more chances of a transaction taking place. Find the variables
which possibly have an effect on the time on page.
- A high bounce rate is a cause of alarm for websites which depend on visitor
engagement. Help the team in determining the factors that are impacting
Forecast the sales based on the independent variables such as Profit, Quantity,
Marketing cost, and Expenses using the regression model.
The dataset is maintained for the Retail Analysis, and it has records of both
independent and dependent variables.
- Import the required dataset.
- Perform descriptive statistics for the dataset.
- Check the significance of independent variables.
- Create a new data set with exponential, cube, squared, and log values for each variable.
- Perform regression test.