What is the risk of a new creditor to default on their loan? what is the “risk” of watching a certain movie on Netflix given a certain viewing history? What is my risk of dying given a certain diagnosis and how is this affected by a certain treatment? Risks are all around us, and quantifying these risks is becoming increasingly popular. Providing the right kind of analysis to these key questions is crucial to making the right decisions.
Read MoreTag: statistics
15 Free Data Science, Machine Learning & Statistics eBooks for 2021
We present a curated list of 15 free eBooks compiled in a single location to close out the year. Among other articles highlighting such materials, I have written a series of posts since the pandemic erupted, in the case that more people spending more time at home may result in more time for reading.
Read MoreSimulating the FIFA World Cup 2022
Who does the data choose to win the largest international football tournament yet? The grandest and most exciting of all football tournaments is still a ways off (2022), but in times like these I find solace in the fact that there are better things (like the next World Cup) that are rapidly approaching with every day that passes. The question on everyone’s mind is always: who wins? My mission is to see what the data says.
Read MoreThe danger of overfitting a Model
An Explanation for Splitting Data into Training and Testing Sets adn teh danger of overfitting.
Read MoreA Quick Introduction to Time Series Analysis
In my first article on Time Series, I hope to introduce the basic ideas and definitions required to understand basic Time Series analysis. We will start with the essential and key mathematical definitions, which are required to implement more advanced models. The information will be introduced in a similar manner as it was in a McGill graduate course on the subject, and following the style of the textbook by Brockwell and Davis.
Read MoreTop 5 tech Startup Trends in 2020
According to Statista, the global startup economy generates nearly $3 trillion. Startups have always been the driving force behind the world market, bringing new ideas, and transforming familiar business systems. Unfortunately, the economic crisis associated with Covid-19 has not spared this area.
Read MoreOckham’s Spatula
The science and the art of model deployment. Model building is like climbing a mountain. It’s what you spend so much time planning for. It’s what everybody wants to talk about. It’s what gives you that euphoric feeling of accomplishment when you’re finished.
Read MoreAn introduction to possibility theory
Possibility theory was introduced by Zadeh [1] and further developed by Dubois and Prade [2] with the motivation to offer a well-defined and formal mathematical representation for linguistic statements that permits handling imprecise or vague information.
Read MoreThe evolution of Data Science … as I remember it
History isn’t always clear-cut. It’s written by anyone with the will to write it down and the forum to distribute it. It’s valuable to understand different perspectives and the contexts that created them. The evolution of the term Data Science is a good example.
Read MoreData Science Life Cycle: A disciplined approach to Data Science
Considering the Data Science Life Cycle as a life cycle enables a natural consideration of crucial overarching factors such as reproducibility, documentation and meta data, ethics, and archiving of research artefacts such as data and code.
Read MoreThe importance of Hypothesis Testing
Hypothesis tests are significant for evaluating answers to questions concerning samples of data. What is the value of hypothesis testing to AI models.
Read MoreData analysis without programming
Data analytics do not always require complicated programming. Applications can be achieved sometimes in a simpler way.
Read MoreThe Data Scientist’s Toolbox (part 1)
Data is the ingredient for data science processes, and by understand what data is can help you to be more efficient and appreciate what data science is all about.
Read MoreAn overview of the global market for chatbot solutions in 2020
Coming from my active watch on the conversational assistant market, I give you some statistics and information on the almost 550 solutions I studied. As a conclusion, I will give you my view about the future.
Read MoreThe last Machine & Deep-Learning Compendium you’ll ever need
In the last 3 years, I have been curating everything related, directly or indirectly, to machine-learning (ML), deep-learning (DL), Statistics, Probability, NLP, NLU, deep-vision, etc. I started curating a compendium because I wanted to expand the scope of my knowledge. I believe that every researcher and data scientist (DS) should strive to learn more on a daily basis, not by hitting task-related walls and solving them, but as a lifelong learning practice.
Read MorePrincipal Components Analysis explained
In real world data sets, many of our variables are unimportant or correlated with each other. If we are doing a supervised machine learning task, leaving in variables unrelated to our target variable lowers the signal to noise ratio in our training data, making it harder to construct a good model.
Read MorePredicting e-Commerce Sales with a Random Forest Regression
In order to demonstrate a random forest regression, a data set of e-commerce sales from popular online retailer, Wish, will be used. The data comes from Kaggle and only features sales information on summer clothing.
Read MoreHow do you make statistical inferences from data?
A quick google search above revealed that the average age of a person in the US is 38. Have you ever wondered how statisticians in Census Bureau came up with that number? Do you think they would go up and ask everyone in person or by mail?
Read MoreCustomer Behavior modeling: The problem with Aggregate Statistics
Using Aggregate Statistics to judge the strength of a Customer Behavior Model is a bad idea. Here’s why.
Read MoreOptimization Techniques — Tabu Search
Tabu Search is a popular algorithm used to optimize a multi-parameter model that can yield exceptional results. Although the implementation is not trivial and requires tuning, it is capable of solving a wide variety of problems once it is created.
Read More