Free e-book! "The complete guide on how to avoid mistakes in creating mobile apps" DOWNLOAD NOW
Tomek Antkowiak
16/02/2018 • 4 min read
INTRODUCTION
Machine learning in applications becomes more and more popular. Intelligent YouTube or Netflix recommendations, live text translation by Google Translate. Combining the power of mobile, artificial intelligence and machine learning leads to the great user experience. However, since learning models is a very computationally complex process, and smartphones are low-power devices, machine learning for mobile will inevitably require training on a local computer or server.
RECOGNISE MODELS
Accurate modern object recognition models may contain millions of parameters. For example, Google’s model Inception-v3 shown in [Fig. 1], where one block represents one layer,
Fig. 1: Inception-v3 diagram is able to distinguish between a spotted salamander and a fire salamander [Fig. 2].
Fig. 2: Photos of spotted and fire salamander
Unfortunately, the training process of such complex models requires huge computing power, i.e., Inception-v3 requires two weeks of learning with 8 NVIDIA Tesla K40 graphics cards. To accelerate the process Google has released a version or a pre-trained inception model that can be adapted to a new task. This process is called transfer learning and significantly facilitates retraining of existing weights of las layers to recognize new objects. It’s not as effective as training from scratch, but surprisingly effective for many applications. The best is that it can achieve satisfactory results in approximately 30 minutes on a laptop, without requiring a GPU.
SIZE PROBLEM
Inception-v3 is a great model, but slowish and bulky for mobile devices. It occupies a lot of space and memory (almost 100 MB). Also, input-to-output processing time takes up to 200-300 ms to process one input 224×224 image on a decent phone (Nexus 5). Fortunately, Google has also released models optimized for mobile – „MobileNet”.
MobileNets are a class of a convolutional neural network created to be fast, resource-efficient and reasonably accurate. (More info: https://arxiv.org/pdf/1704.04861.pdf)
Google released many types of MobileNet [Fig. 3]:
Fig. 3: MobileNet model types
Where:
Additionally, every model comes with normal and quantized weights. A quantized model version uses 8-bit weights instead of 32 bit. As a result, the model has decreased its size up to 75% (at the cost of slightly worse accuracy), and because of the 8-bit computation, the processing time has decreased.
GATHER TRAINING DATA
To get started, we need training data of objects we want to recognize. We need at least 1000 images of every object. To make this process faster, we can make a movie and split it into frames. To make it happen I will useFFMpeg
.
If movie resolution is high, we should reduce it first. WithFFMpeg
we can call the command below:
If we pass desired_width as500
, it will scale the width down to 500 px and because of the passed height size value of-1
, the script will automatically adjust it to maintain the ratio.
Finally, we can split it with:
If the movie is recorded in 30 fps and we pass the fps value of:
This process should be repeated for every object we want to recognize.
TRAINING TIME
I assume that you have already installed TensorFlow. If not, please follow this guide: https://www.tensorflow.org/install/.
To start retraining, execute the retrain.py script:
Where:
image_dir
– a path to the folder with the structure like this:learning_rate
– controls the size of the updates to the final layer during training,testing_percentage
– what percentage of images to use as a test set,validation_percentage
– what percentage of images to use as a validation set,train_batch_size
– how many images to train on at a time,validation_batch_size
– how many images to use in an evaluation batch. This validation set is used much more often than the test set, and is an early indicator of how accurate the model is during the training. A value of -1 causes the entire validation set to be used, which leads to more stable results across training iterations, but may be slower on large training sets,flip_left_right
– whether to randomly flip half of the training images horizontally,random_scale
– percentage determining how much to randomly scale up the size of the training images,random_brightness
– percentage determining how much to randomly multiply the training image input pixels up or down,eval_step_interval
– how often to evaluate the training results,how_many_training_steps
– how many training steps to run before ending,architecture
– name of a model architecture (which will be automatically downloaded).At first, I recommend leaving the architecture field blank. Inception-v3 model will be selected. This will verify if the quality of your training data is sufficient. If the accuracy will be satisfactory, you can try to select smaller MobileNet architectures.
We can observe the learning process in the console window or graphically [Fig. 4], in the form of graphs, by calling:
And opening http://localhost:6006/ in our browser.
Fig. 4: TensorBoard interface
On completion of the learning process, the model will be saved to/tmp/output_graph.pb
and labels file to/tmp/output_labels.txt
.
As you can see, retraining a model to recognize custom objects is pretty easy and takes less than an hour, including learning time, on a decent laptop. In the next article, I will show how to make use of the generated model to visualize results of recognized objects.
Blog
Mobile app or mobile website - which is better for your business?5/5 - (1 vote) The number of smartphone users is continuously increasing and almost all of them use the internet regularly on their mobile devices. Businesses that want to meet...
Keep reading