This post was originally published by Benjamin Obi Tayo, Ph.D. at KD Nuggets
There are two basic pathways to data science, the traditional college degree pathway and the self-study pathway.
Traditional College Degree Pathway: Several top universities offer traditional graduate-level programs in data science. Because these are graduate-level programs, most will require an undergraduate degree in an analytical field such as physics, mathematics, accounting, business, computer science, or engineering. These programs typically have a duration of 3 to 4 semesters for those who pursue full-time enrollment. Traditional programs come in different flavors, such as: Data Science Master’s, Data Analytics Master’s, or Business Analytics Masters. The cost of tuition for traditional face-to-face programs could be anywhere in the range of $15,000 to $40,000, not including living expenses. For online data science master’s programs, the cost could be anywhere from $12,000 to $40,000. The links below examine top MS degrees, online, in Europe, and US/Canada, with ranking, tuition, and more.
Self-Study Pathway: If you want, put in four years at a college (or more at a graduate school). This will give you a deeper understanding of the field of data science, but if your circumstances don’t allow you to pursue a college degree, you can (with some passion and dedication) teach yourself data science through self-study. There are so many excellent online data science courses on platforms such as edX, Coursera, DataCamp, Udacity, and Udemy. By dedicating some time, you can learn the fundamentals of data science from these courses. The self-study pathway is thus very affordable compared to the college degree pathway.
In this blog, we discuss 10 resources for data science self-study. These resources are grouped into 3 main categories: (A) Resources for building fundamental knowledge; (B) Resources for data science practice; and (C) Resources for Networking and Continuous Studies.
A. Resources for Building Fundamental Knowledge
The first step in your journey to data science is to build a strong knowledge of fundamental data science concepts. The resources discussed below are helpful for learning the basic concepts.
1. Massive Open Online Courses (MOOCs)
The rising demand for data science practitioners has given rise to a proliferation of massive open online courses (MOOCs). The most popular providers of MOOCs include the following:
- edX: https://www.edx.org/
- Coursera: https://www.coursera.org/
- DataCamp: https://www.datacamp.com/
- Udemy: https://www.udemy.com/
- Udacity: https://www.udacity.com/
- Lynda: https://www.lynda.com/
If you are going to be taking one of these courses, keep in mind that some MOOCs are 100% free, while some do require you to pay a subscription fee (which could range anywhere from $50 to $200 per course or more, varies from platforms to platforms). Keep in mind that gaining expertise in any discipline requires an enormous amount of time and energy. So, do not be in a rush. Make sure that if you decide to enroll in a course, you should be ready to complete the entire course, including all assignments and homework problems. Some of the quizzes and homework assignments will be quite challenging. However, keep in mind that if you don’t challenge yourself, you won’t be able to grow in your knowledge and skills.
Having completed so many data science MOOCs myself, below are 3 of my favorite data science specializations.
Professional Certificate in Data Science (HarvardX, through edX)
Includes the following courses, all taught using R (you can audit courses for free or purchase a verified certificate):
- Data Science: R Basics;
- Data Science: Visualization;
- Data Science: Probability;
- Data Science: Inference and Modeling;
- Data Science: Productivity Tools;
- Data Science: Wrangling;
- Data Science: Linear Regression;
- Data Science: Machine Learning;
- Data Science: Capstone
Analytics: Essential Tools and Methods (Georgia TechX, through edX)
Includes the following courses, all taught using R, Python, and SQL (you can audit for free or purchase a verified certificate):
- Introduction to Analytics Modeling;
- Introduction to Computing for Data Analysis;
- Data Analytics for Business.
Applied Data Science with Python Specialization (the University of Michigan, through Coursera)
Includes the following courses, all taught using python (you can audit most courses for free, some require the purchase of a verified certificate):
- Introduction to Data Science in Python;
- Applied Plotting, Charting & Data Representation in Python;
- Applied Machine Learning in Python;
- Applied Text Mining in Python;
- Applied Social Network Analysis in Python
2. Learning from a Textbook
Learning from a textbook provides a more refined and in-depth knowledge beyond what you get from online courses. This book provides a great introduction to data science and machine learning, with code included: “Python Machine Learning” by Sebastian Raschka. https://github.com/rasbt/python-machine-learning-book-3rd-edition
The author explains fundamental concepts in machine learning in a way that is very easy to follow. Also, the code is included, so you can actually use the code provided to practice and build your own models. I have personally found this book to be very useful in my journey as a data scientist. I would recommend this book to any data science aspirant. All that you need is basic linear algebra and programming skills to be able to understand the book.
There are lots of other excellent data science textbooks out there such as “Python for Data Analysis” by Wes McKinney, “Applied Predictive Modeling” by Kuhn & Johnson, and “Data Mining: Practical Machine Learning Tools and Techniques” by Ian H. Witten, Eibe Frank & Mark A. Hall.
YouTube contains several educational videos and tutorials that can teach you the essential math and programming skills required in data science, as well as several data science tutorials for beginners. A simple search would generate several video tutorials and lectures. Three of my favorite courses on YouTube are:
- Linear Algebra by Gilbert Strang
- Introduction to Deep Learning by Alexander Amini
- Introduction to Computational Thinking and Data Science by John Guttag and Eric Grimson
4. Khan Academy
Khan academy is also a great website for learning basic math, statistics, calculus, and linear algebra skills required in data science. This should be a great resource for individuals that are interested in data science but don’t have the required quantitative background.
B. Resources for Data Science Practice
As data science is a practical field, academic knowledge obtained from coursework alone will not make you a data scientist. You need to apply your knowledge to real-world data science projects in order to qualify as a data science practitioner. The following platforms will enable you to hone your data science skills by applying your knowledge to practical problems.
Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals. Kaggle allows users to find and publish datasets, explore and build models in a web-based data science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. On this platform, you can have access to datasets, courses, notebooks, and competitions. As you participate in Kaggle projects and competitions, your knowledge and experience in data science will continue to grow. Kaggle is also a great platform for networking with other data science practitioners and enthusiasts.
Internships provide an excellent opportunity to work on data science projects. Several companies provide opportunities for students to intern for a period that can range from few months to a year. Data science internships are typically advertised via platforms such as indeed.com or LinkedIn Jobs.
C. Resources for Networking and Continuous Studies
As data science is a field that is continuously evolving due to technological developments in the field, continuous study is essential in data science. Creating a network of collaboration with other data scientists would enable you to always be on top of the game. The following platforms are great resources for networking and continuous studies.
Medium is now considered one of the fastest-growing platforms for learning about data science and networking. If you are interested in using this platform for data science self-study, the first step would be to create a medium account. You can create a free account or a member account. With a free account, there are limitations on the number of member articles that you can access per month. A member account requires a monthly subscription fee of $5 or $50/year. Find out more about becoming a medium member from here: https://medium.com/membership.
With a member account, you will have unlimited access to medium articles and publications. Medium has several data science publications that can help you to learn about new developments in the field as well as network with other data scientists or aspirants. The 2 top data science publications on the medium are Towards Data Science and Towards AI. Every day, new articles are published on medium covering topics such as data science, machine learning, data visualization, programming, artificial intelligence, etc. Using the search tool on the medium website, you can access so many articles and tutorials covering a wide variety of topics in data science, from basic to advanced concepts.
LinkedIn is an excellent platform for networking. There are several data science groups and organizations on LinkedIn that one can join, such as Towards AI, DataScienceHub, Towards data science, KDnuggets, etc. You can also follow top leaders in the field on this platform.
KDnuggets is a leading site on AI, Analytics, Big Data, Data Mining, Data Science, and Machine Learning. On this website, you can find important educational tools and resources in data science as well as tools for professional development:
- Top stories
- Events (online)
GitHub contains several tutorials and projects on data science and machine learning. Besides being an excellent resource for data science education, GitHub is also an excellent platform for project organization and portfolio building. For more information on creating a data science portfolio on GitHub, please see the following article: Building a Portfolio That Stands Out Using These Platforms.
This post was originally published by Benjamin Obi Tayo, Ph.D. at KD Nuggets