Microsoft releases Unadversarial examples: Designing objects for robust vision – a complete hands-on guide

In a recent work by Microsoft Research, a new framework is introduced which can address these problems of data models to create “unadversarial objects,” inputs that are optimized particularly for more robust model performance. This newly proposed approach for image recognition/classification methods helps in predicting better in the case of unforeseen corruptions or distribution shifts.

Read More

15 Free Data Science, Machine Learning & Statistics eBooks for 2021

We present a curated list of 15 free eBooks compiled in a single location to close out the year. Among other articles highlighting such materials, I have written a series of posts since the pandemic erupted, in the case that more people spending more time at home may result in more time for reading.

Read More

How to train computer Vision Models efficiently

@alwaysaialwaysAI
A developer platform for creating & deploying computer vision applications on the edge.

The starting point of building a successful computer vision application is the model. Computer vision model training can be time-consuming and challenging if one doesn’t have a background in data science. Nonetheless, it is a requirement for customized applications.

Several developers who build computer vision applications have experience with Python, but do not have experience with data science. This poses a challenge to most developers who want to customize a model. 

There are tools that you can use to train your own computer vision model, such as TensorFlow, PyTorch and Keras. alwaysAI also offers a Model Training Toolkit which leverages transfer learning in order to modify an existing model instead of training a model from scratch. This article will go over some basic concepts of model training and then give an example of how to train a custom model. 

Leveraging transfer learning is the key to efficiently customizing a computer vision model. Transfer learning is essentially taking the knowledge gained from training on a dataset, such as the COCO dataset, and improving upon it or modifying it to your specific task/dataset. This saves the time of learning fundamentals like edge detection and feature primitives.

The COCO dataset is a popular open source dataset that we use as the basis of our transfer learning. It is computer vision specific and due to its wide variety classes and large size is the dataset commonly used for open-source computer vision models. With alwaysAI you can get all of the power of the COCO dataset tuned to your specific task or application. 

alwaysAI provides access to pre-trained computer vision models as well as the ability to leverage transfer learning to train your own models. Using a pre-trained model can certainly expedite computer vision app development. However, pre-trained models can also impose limitations on what applications you can build. In order to build a unique, custom application, you are going to need to train your own model. Using transfer-learning enabled model re-training not only requires less training data but can also significantly cut down the time it takes to train a new model. 

There are three major components involved in training a model using the alwaysAI Model Training Toolkit:

Data Collection: Our Image Capture application helps you with your data collection, so that you can collect videos and images directly from your edge device. See this GitHub repo for more on this. 
Data Annotation: We have CVAT, a data annotation tool, built right into our Model Training Toolkit so that you can easily annotate your data. 
Transfer Learning: As mentioned earlier, train a custom MobileNet SSD model using transfer learning to take the knowledge of the COCO dataset and improve upon it or modify it to your unique dataset. 
After training a model, you can upload it and use it to build your own custom computer vision application with our easy to use Python API. With alwaysAI, you can customize a computer vision model with no coding, and little experience. Model training is no longer relegated to the domain of data scientists. 

You can try our Model Training Toolkit for free and gain access to our support team on Discord, our free Webinars, and our weekly Hacky Hours. 

Here is a quick example of how to train a new model, which detects specific hand gestures using our Model Training Toolkit. This example also highlights the benefit of using two models in conjunction in an application.

By using the alwaysAI hand model in our application to first detect hands, we can then feed ‘hand detections’ as an input into our gesture model. Since our hand model was trained on a wide variety of data, it will detect hands of various sizes, with different backgrounds, in images with different lighting and from different distances. 

We then use the alwaysAI Python library to cut out the portion of the original image that contains the hand, and feed this new image into the gesture model. The result of this is that the input into the gesture model will always be the area that exactly surrounds the hand meaning we don’t have to vary the training data in terms of background, or sizes or distances of the hand! Therefore, we need a lot less training data, and less time training.

As mentioned above, the first step in training your own model is data collection. Since we’re training specific gestures, we can just take videos of a single gesture moving around in the video to get different lighting and some variation in sizes. By having each video contain just one gesture, we’ll make the annotation step, which we’ll cover next, much easier.

The second step to training your own model is data annotation. alwaysAI makes this easy by integrating the Computer Vision Annotation Tool directly into the alwaysAI software. Using the tracking feature in CVAT along with the ‘fast-forward’ annotation, we can have CVAT interpolate some of the bounding boxes so we can easily annotate an entire video containing a single gesture in 20 or so minutes.

Once we have the data collection and annotation complete, we can train our model! With alwaysAI, you can train using the command line interface, or a Jupyter Notebook. The latter option is helpful if you want to see a plot of your training and testing validation.

Finally, once our model is trained, we can prototype our application. Just using the gesture model alone only works when the hand is very close to the screen. 

To make the model work when the hand is farther away, we could collect images in which the hand is farther away from the camera, and then annotate this data, include it in training, and the model will be more robust.

However, another option, which is a bit quicker, is to use our gesture model with input that is generated from first using our hand model. By using the hand model first, we can detect hands that are farther away and then use this portion of the image as input for the gesture model. Now we can detect gestures from far away! 

Another issue we discover when we test out our gesture model in the prototype is that we don’t pick up gestures when the hand is rotated (e.g. sideways). We could do another iteration of data collection, annotation, training in this case as well, using training data where hands displaying different gestures are in different positions.

While this is doable, it might be better to test our hypothesis that this is the issue first. We can use alwaysAI’s API to rotate the image of the hand that is generated by the hand model’s bounding box and then feed that into the gesture model.

Now we have a working model that detects unique gestures!

This tutorial was covered in one of our previous Hacky Hours focusing on Data Collection and Annotation. You can watch the recording here:

[embedded content]
If you would like to learn more about the alwaysAI Model Training Toolkit in action and expand on this application, please join our next free Hacky Hour. It takes place on Thursday, November 5 and will be more focused on transfer learning. This virtual workshop will cover building out our prototype application to interact with external devices, and turn gestures into actionable responses in our computer vision application. 

You can sign up for our free Hacky Hours here.

Share this story

Related

pre-emoji story

pre-emoji story

Tags

Join Hacker Noon
Create your free account to unlock your custom reading experience.

Read More

Using Data Science Skills Now: Text Readability Analysis

How to identify reading level scores using python. When marketing effectiveness analytics are being developed, the content reading level is often overlooked. Should this be a part of the analysis or your model? If you decide it is, how would you easily tag your different documents or creatives with the current reading level? Today I will review how to use python to do reading level analysis and create a report.

Read More

Getting Started Guide — Anaconda

With the ever-increasing demand for Python programming language, the first task which any beginner struggles with is the setting up of the right development environment.
This tutorial aims to introduce you to Anaconda Platform, a free and open-source distribution of Python and R programming languages for scientific computing for simplifying package management and deployment.

Read More

PyCaret 2.2 is here — What’s new?

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It is an end-to-end machine learning and model management tool that speeds up the machine learning experiment cycle and makes you more productive. PyCaret 2.2 is now available for download, so what’s new?

Read More

Kite expands its AI code completions from 2 to 13 programming languages

Kite, which suggests code snippets for developers in real time, today added support for 11 more programming languages, bringing its total to 13. In addition to Python and JavaScript, Kite’s AI-powered code completions now support TypeScript, Java, HTML, CSS, Go, C, C#, C++, Objective C, Kotlin, and Scala.

Read More

A practical guide to RNN and LSTM in Keras

Photo by Daniele Levis Pelusi on Unsplash

After going through a lot of theoretical articles on recurrent layers, I just wanted to build my first LSTM model and train it on some texts! But the huge list of exposed parameters for the layer and the delicacies of layer structures were too complicated for me. This meant I had to spend a lot of time going through StackOverflow and API definitions to get a clearer picture. This article is an attempt to consolidate all of the notes which can accelerate the process of transition from theory to practice. The goal of this guide is to develop a practical understanding of using recurrent layers like RNN and LSTM rather than to provide theoretical understanding. For more in-depth understanding, I suggest this and this, which I recommend going through before reading this article. If you are ready, let’s get started!

The complete RNN layer is presented as SimpleRNN class in Keras. Contrary to the suggested architecture in many articles, the Keras implementation is quite different but simple. Each RNN cell takes one data input and one hidden state which is passed from a one-time step to the next. The RNN cell looks as follows,

The flow of data and hidden state inside the RNN cell implementation in Keras. Image by Author.
The complete formulation of an RNN cell is,

here, h{t} and h{t-1} are the hidden states from the time t and t-1. x{t} is the input at time t and y{t} is the output at time t. The important thing to notice is that there are two weight matrices W{hh} and W{hx} and one bias term b{h}. Each of these matrices can be thought of as an internal 1 layer neural network with output size as defined in the parameter units , also bias has the same size. y{t} is raw h{t} and we don’t apply another weight matrix here, as suggested by many articles. This represents one individual cell of RNN, and sequential combination of cells (count equal to time-steps in data) creates the complete RNN layer. Remember the same weight matrices and bias are shared across the RNN cells. Finally, we can compute the number of parameters required to train the RNN layer as follows,

Notice that input is a tuple in format (time-steps, features) and that the parameters only depend on the features as we share the same weights across each time-steps. This can be checked by displaying the summary of a sample model with RNN in Keras.

Checkout the Params in simple_rnn_2, it’s equal to what we calculated above. The additional 129 which took the total param count to 17921 is due to the Dense layer added after RNN.
We can also fetch the exact matrices and print its name and shape by,

Points to note, Keras calls input weight as kernel, the hidden matrix as recurrent_kernel and bias as bias. Now let’s go through the parameters exposed by Keras. While the complete list is provided, we will look at some of the relevant ones briefly.
The first and foremost is units which is equal to the size of the output of both kernel and recurrent_kernel. It is also the size of bias term and the size of the hidden term.
Next, we have activation which defined the g() function in our formulation. Default is “tanh”.
Then we have {*}_initializer , {*}_regularizer and {*}_constraint parameters each for kernel, recurrent_kernel and bias. These could be ignored if you are not sure about them, as the default values are good enough.
use_bias is a boolean parameter which turns on or off the bias term.
dropout and recurrent_dropout is used to apply dropout probability to kernel and recurrent_kernel respectively.
return_sequence is a boolean parameter. When its “True”, the output shape of the RNN layer is (timestamp, feature) and when its “False” the output is only (features). This means, if its turn on, in output we return the y{t} from all time-steps, and if it’s off then we only return 1 y{t} (here from the last time-step). An additional caveat, don’t forget to add a TimeDistributed layer or Flatten layer after an RNN with return_sequence turned on before you add a Dense layer.
go_backwards is of boolean type and when its “True” the RNN process the data in reverse order. Default is “False”
return_state is of boolean type and when “True” it returns the last state in addition to the output. Default is “False”.
stateful is an important parameter. When turned “True”, Keras uses the same hidden state across batches for the same sample index. Understand it like this, we train our model for multiple epochs which is like iterations over the complete data. 1 epoch is 1 pass over the complete data. Now each epoch contains multiple batches which in turn contains multiple samples i.e. the individual data. Usually, after running on each sample in a batch, the state of the RNN cell is reset. But if we have prepared the data in a format such that across multiple batches, samples at a particular index are just an extension of the same sentence, we can turn stateful as “True” and it will equivalent to training all sentences at once (as one sample). We may do this due to memory constraint and hence if we cannot load complete data at one go. Default is “False”.
With the basic of RNN clear, let’s look into one architecture which is frequently created with RNN.

Stacking multiple recurrent layers on top of each other has been suggested to work better for multiple applications. This leads to a mesh-like structure, where while horizontal depth (visualize an unrolled RNN) is due to the time-steps, the vertical copies (stacking) are due to the new RNNs layers. This is called Seq2Seq modelling and are mainly used for — language translation, entity tagging, speech recognition — kind of applications where we have a sequence as input and output. That said, we can also stack multiple RNNs before finally applying a fully connected dense layer, this is an example of the sequence as input but flattened output. A sample code is,

It’s pretty simple as we have just added two new RNN layer to the previous code. But notice we turn return_sequence as “True” to an RNN layer if we want to stack another RNN on top of it. This is because the next RNN expects time distributed input and the output of each time-step of the previous RNN becomes the input to the upper RNN for the same time-steps. Here, while the trainable parameters for the 1st RNN remain same as suggested before, the 2nd and 3rd RNN have different parameters because the input size to these RNN is 128. This makes the training parameter for each of the next two RNN equal to,

Moving on to LSTMs, there are a bunch of very good articles on it like this and this. I would suggest having a look at them before moving further. Similar to the issue with RNN, the implementation of LSTM is little different then what is proposed in most articles. The main difference is, instead of concatenating the input and previous hidden state, we have different weight matrices which are applied to the both before passing them to 4 internal neural networks in the LSTM cell. This means we have doubled the number of matrices required (in reality it doubles the dimensions, but more on this later). The 4 matrices which are multiplied with input are called kernel and the 4 which multiply with the previous hidden state are called recurrent_kernel. To understand this better, let’s look at the formulation,

Here if you observe, we have a total of 8 weight matrices, and assuming each has the same size, we can say that in one way we are doing the same operations we did in RNN but now 4 times more. Hence the number of trainable parameters can now be calculated by,

And switching from RNN to LSTM as easy as replacing the respective function call, this can be seen by the following code,

Match the Params mentioned in lstm_1 with what we computed.
We can again extract all of the weights from the model by,

Here note that all of the 4 kernel matrices and 4 recurrent_kernel matrices are stored in 1 single monolithic matrix each (concatenated on column axis), hence the dimension is 128*4=512. Same is true for the bias term. Also, nearly all of the parameters used in RNN are applicable here. One additional parameters caveat is recurrent_activation which has a default value of “sigmoid” and is applied to the input, forget and output gate as suggested above in the formula. This leaves the actual activation which is applied to cell state and hidden state (with a default value of “tanh” ) as suggested above as well in the formula.

We have tried to cover some of the basic topics required to connect the theory and practical for recurrent layers in Keras. As a complete guide with all the intrinsic details will be too much for a single article, I think there are a lot of materials out there which explain the topic very well. What I really missed was some notes which connects the formulae I saw in the articles with what is really implemented in Keras, with some additional practical details. Hope this was of help!
Cheers.
All of the code from the article has been uploaded here.
For more articles like this visit my website and connect with me @ linkedin.

Read More

Managing software dependencies for Data Science projects

A step-by-step guide for maintaining project dependencies clean and reproducible. Virtual environments are a must when developing software projects. They allow you to create self-contained, isolated Python installations that prevent your projects from clashing with each other and let other people reproduce your setup.

Read More
1 2 3 4