12 Data Science projects for 12 days of Christmas

towards-data-science

This post was originally published by Terence S at Towards Data Science

As you approach the holidays, you’ll finally have time to sit on some hot chocolate, sit by the fireplace, and watch hours and hours of Christmas movies.

Or… You can spend your time doing some awesome data science projects! Because what’s a better way to spend your time than adding another personal project to your GitHub repo 😉

And so, I give you 12 data science projects for 12 days of Christmas! Each one can be finished in a day and each will guarantee to teach you valuable and applicable skills.

With that said, let’s dive into it:

1. Python Simulations

Image for post

Building simulations are not only really cool, but quite relevant with the pandemic! Not only are Python simulations very beneficial to your coding fluency and your understanding of data science, but they are also fun and addictive to play around with.

There are a myriad of scenarios and factors you can simulate, often with less than a couple hundred lines of code. For example, I have an article about simulating a basic pandemic and predicting population control, both of which have code which you’re free to see!

Difficulty: Anywhere from trivial to super complex!

Where to start:

Skills you’ll learn:

  • Object-oriented programming
  • Simulating randomness in Python
  • Modelling real-life scenarios

2. Retail Analytics

Image for post

Although it is interesting to simulate the spread of disease or social dynamics, we can find uses of data science and programming in business, too.

Forecasting sales for holidays, like Christmas, is incredibly important for determining how much to produce. Too much and there’s stale inventory. Too little and you’ve lost out on potential revenue.

Below are several resources for you to learn and practice retail sales forecasting.

Difficulty: Intermediate

Where to start:

Skills you’ll learn:

  • Predictive modelling, possibly time-series forecasting as well
  • Understanding business statistics

3. Covid-19’s Impact on Airport Traffic

Image for post

In a similar vein, let’s bring together science and business to improve our data science skills with a crucial, real life scenario. In the past nine months, Covid-19 has hugely changed the way we live our lives — particularly it has had a massive impact on worldwide travel. With the dataset below, explore the data, create visualizations, and even see if you can create a prediction model for airport traffic.

Difficulty: Easy

Where to start:

  • Get this dataset here.
  • Learn how to create data visualizations with Plotly here

Skills you’ll learn:

  • Exploratory Data Analysis
  • Data Visualizations

4. Tweetdeck Replica

Image for post

If you already use Tweetdeck, this project is for you! Tweetdeck is a tool for Twitter that allows you to track your Twitter engagement and a variety of insights in real time. Using the Twitter API and a visualization tool like Dash or Streamlit, you can create a simple web application to create your own analytics platform for Twitter!

Difficulty: Intermediate

Where to start:

  • Get familiar with Tweetdeck
  • Learn how to engage with APIs and request an API key from Twitter
  • Learn about a visualizing tool to deploy your visualizations, like Dash or Streamlit

Skills you’ll learn:

  • Working with APIs
  • Creating interactive insights and analytics dashboards

5. A/B Testing: Click-Through Rates

Image for post

Arguably one of the most practical data science concepts in the workplace is A/B Testing. And yet, it is a concept that is quite misunderstood because there are a lot of intricacies to it.

More specifically, determining click-through rates is an extremely metric for any company with a marketing team. By properly measuring click-through rates, you can optimize the appearance, the messaging, and anything else related to your online advertisements.

Difficulty: Intermediate

Where to start:

  • Sample dataset here.
  • Follow my step-by-step walkthrough here.

Skills you’ll learn:

  • Exploratory Data Analysis
  • How to conduct a proper A/B test for click through rates

6. Recommendation System

Image for post

The recommendation algorithms used by modern social media platforms and content aggregators are extremely complex and constantly developing. What’s a better way to understand how they work and improve themselves by building one yourself?

Difficulty: Intermediate-Advanced

Where to start:

Skills you’ll learn:

  • Building Recommendation Systems
  • SVD, matrix factorization

7. Trustpilot Webscraper

Image for post

Learning how to webscrape data is simple to learn and extremely useful! Scraping a customer review website, like Trustpilot, is valuable for a company as it allows them to understand review trends (getting better or worse) and see what customers are saying via NLP.

Difficulty: Easy

Where to start:

Skills you’ll learn:

  • Webscraping data
  • Analyzing customer reviews
  • Take it further and apply NLP to extract insights from reviews.

8. Customer Segmentation

Image for post

What do you know, we’ve come full circle back to our challenge on retail analytics! But in this problem, however, the goal is to use statistics to cluster customers into similar groups so that you can identified desired customer segments that you want to market your business to!

Difficulty: Intermediate-Advanced

Where to start:

Skills you’ll learn:

  • Clustering techniques
  • Dimensionality reduction

9. Time Series Forecast on Energy Consumption

Image for post

This dataset is composed of power consumption data from PJM’s website. PJM is a regional transmission organization in the United States. Using this dataset, see if you can build a time series model to predict energy consumption. In addition to that, see if you can find trends around hours of the day, holiday energy usage, and long term trends!

Difficulty: Medium-Advanced

Where to start:

  • Link to dataset here.
  • YouTube tutorial on Times Series Analysis in Python here.

10. Stocks Predictions

Image for post

What if you want to predict whether Tesla stocks will shoot to the mooooon. With time series forecasting, you can try to predict the trajectory of a stock. To make it easier, you can use Facebook’s time-series library called Prophet, which does a lot of the heavy lifting for you.

Difficulty: Intermediate

Where to start:

Skills you’ll learn:

  • More time series knowledge
  • Prophet — Facebook’s Time-Series package

11. Instagram — Likes Prediction

Image for post

Do you have some pictures you want to post to Instagram, but you are not sure which one will get you the most likes or comments? Well, data science can help you with that! You can create a predictive model based around various factors, such as the hashtags you use, the length of your post description, the number of pictures in a carousel, and throw it all together. From there you can test your ideas against this model, observe the outputs, and find the image format that is most likely to get you the most likes! This is a great project to work on if you are interested in machine learning, too.

Difficulty: Difficult!

Where to start:

  • Don’t push yourself too far on your first version. Just take factors like brightness of image, length of post description, etc., which can be collected through web scraping or Instagram’s API.
  • Format these values and use a machine learning or predictive model to map these to how many likes each post got
  • From here, scale up by adding in hashtags, time of posting, etc and analysing thousands — or hundreds of thousands of posts — automatically to grow your data set.
  • This is a difficult task which can be scaled up indefinitely so don’t be upset if you struggle on your first attempt. It’s why I put this one at the end of the list.

Skills you’ll learn:

  • Collecting, cleaning, and manipulating data
  • Predictive modelling using machine learning models

12. Resume — Job Application Matcher

Image for post

The last topic that I wanted to leave a little more open-ended is creating a resume-job description matcher. By using NLP techniques like latent semantic analysis, see if you can determine how close a resume matches a job description.

Where to start:

  • Learn more about latent semantic analysis here
  • Check out a similar idea related to resumes and job descriptions here.

Skills you’ll learn:

  • NLP techniques like latent semantic analysis and/or cosine similarity
  • Potentially linear algebra and SVD (singular value decomposition)

Thanks for Reading!

Well, I hope you enjoyed this article about the twelve best data science projects that you can complete in a day or less, to keep you occupied over the winter holidays! If you were inspired by any of these, I strongly recommend that you attempt at least one.

Spread the word

This post was originally published by Terence S at Towards Data Science

Related posts