Stratified splitting of grouped Datasets using optimization

One of the most frequent steps on a machine learning pipeline is splitting data into training and validation sets. It is one of the necessary skills all practitioners must master before tackling any problem. The splitting process requires a random shuffle of the data followed by a partition using a preset threshold. On classification variants, you may want to use stratification to ensure the same distribution of classes on both sets. When handling time series data, you might want to skip shuffling and keep the earliest observations on the training set.

Read More

Time Series Analysis: The components that define it

Whenever data or observations or some other information is recorded regularly over time intervals, you are looking at time data. Time Series Analysis is all about analyzing the data over time to forecast what will happen in the future based on those patterns. This is so that the re-occurrence of the same trend gets predicted. It said that history often repeats itself, so whatever events happened in the past are likely to occur in the future.

Read More

End to End Deep Learning: A different perspective

Whenever there is an article on an end-to-end deep learning project, it consists of training a deep learning model, deploying a Flask API, and then making sure it works or it extensively consists of creating a web demo using Streamlit or something similar. The problem with this approach is that it talks about a straight-forward and typical path that has been tried and tested. It merely takes replacing a single piece of the puzzle with an equivalent, such as a sentiment analysis model with a classification model, etc, and a new project can be created, but the wireframe remains mostly the same.

Read More

The Human Bias-Accuracy Trade-off

“Bias” is a tendency or inclination to favor or disfavor a certain set over the other. All humans have a certain degree of bias because we are inherently programmed to discern anyone different as a threat. Due to this implicit bias, we tend to unconsciously ascribe traits and qualities to usually stigmatized groups in society. These include groups based on sex, race, nationality, or geographic heritage. According to Stanford Encyclopedia of Philosophy-

Read More

Unravelling Transfer Learning to make Machines more advanced

Advanced machines never fail to leave men in awe. But only researchers who worked behind the machines know how much time, cost and data it took to become a stage stealer. Training an algorithm that employs various features in a machine is quite nerve-wracking. But tech geeks have found a solution using transfer learning. Besides, companies are also unveiling a mixture of technologies like deep learning neural networks and machine learning to come up with futuristic machines.

Read More

Principal Component Analysis (PCA)

This is the first post in a two-part series on Principal Component Analysis (PCA) and Independent Component Analysis (ICA). Although they have similarities (such as their names), they each achieve different tasks. In this post, I will give describe what PCA is, how it works, and as an example use it to define an S&P 500 index fund. Example code and other related resources can be found in the last section of this post.

Read More

How to encode Time-Series into Images for Financial Forecasting using Convolutional Neural Networks

Within forecasting there’s an age old question, ‘is what I am looking at a trend?’ Within the realm of statistics there are many tools that, with various degrees of success, answer said question. Yet, no approach has been able to achieve that which started the field of data forecasting in the first place. Looking at a graph derived from the data and drawing conclusions from it. However, thanks to Deep Learning and Artificial Neural Networks that is about to change. Welcome Computer Vision!

Read More

The dark side of Data Science

As I discuss in my article “Myths of Modelling: Data Speak”, Positivism — and, by association, its mythical beliefs — had been pretty thoroughly discredited by the 1960s. Unfortunately, as if often the case in the history of ideas, the counter-revolution over-compensated. Where the early revolutionaries would loosen the chains of narrow empiricism and open up for a more enlightened dialogue between hypotheses and the data that inspire and regulate them, the next generation would throw empiricism out all together. In the ensuing vacuity of common sense, practitioners had little choice but to crawl back to frameworks steeped in positivism.

Read More

Multidimensional multi-sensor time-series data analysis framework

Multidimensional multi-sensor time-series data analysis framework. In this blog post, I will take you through my package “msda” useful for time-series sensor data analysis. A quick introduction about time-series data is also provided. The demo notebook can be found on here. One of the specific use case applications focused on “Unsupervised Feature Selection” using the package can be found in the blog post here.

Read More

Hands-on to ReAgent: End-to-End Platform for Applied Reinforcement Learning

Facebook ReAgent, previously known as Horizon is an end-to-end platform for using applied Reinforcement Learning in order to solve industrial problems. The main purpose of this framework is to make the development & experimentation of deep reinforcement algorithms fast. ReAgent is built on Python. It uses PyTorch framework for data modelling and training and TorchScript for serving.

Read More

Bias in Machine Learning Algorithms

The progress made in the field of machine learning and its capabilities in solving practical problems have heralded a new era in the broader domains of artificial intelligence (AI) and technology. Machine learning algorithms can now identify groups of cancerous cells in radiographs, write persuasive advertising copy and ensure the safety of self-driving cars.

Read More

Understanding Categorical Data

Feature engineering is a crucial step in building a performant machine learning model. Understanding categorical variables and encoding those variables with the right encoding techniques is paramount during the data cleaning and preparation stage. A survey published on Forbes says that Data preparation accounts for about 80% of data scientists’ work. Data scientists spend 60% of their time cleaning and organizing data.

Read More

AI: The Algorithmic Doctor

Ever wondered what all it takes for machines to act like humans. AI functioning as a doctor is much more technical than it looks, so here we are with our latest blog on algorithms used for AI in Psychiatry. Do give it a read to know what goes behind to get the unmatched accuracy and advanced features for supporting mental health!

Read More

Why Is Object Detection So Messy?

Object Detection

Those working with Neural Networks know how complicated Object Detection techniques can be. It is no wonder there is no straight forward resource for training them. You are always required to convert your data to a COCO-like JSON or some other unwanted format. It is never a plug and play experience. Moreover, no diagram thoroughly explains Faster R-CNN or YOLO as there is for U-Net or ResNet. There are just too many details.

Read More

Detecting a car is TESLA or NOT using Deep Learning with Fast.AI

Hey, there hope you are doing well. So, recently I started going through the Fast.AI deep learning curriculum where two brilliant persons — Jeremy Howard and Rachel Thomas teach Deep Learning. One is a very experienced programmer and another is a mathematician so what could be a better combination .

Read More
1 2 3 10