Porting Neural Network Models onto Mobile Devices

clock-icon-white  5 min read

In our previous blog, we looked at how deep learning model compression and optimization accelerate business operations and improve user experience. Now, we’ll examine how mobile deep learning frameworks have the potential to accelerate computations by taking place on devices rather than on a server.

Computations are commonly completed in the cloud, but executing computations on devices like PCs, mobile phones, or edge devices has the potential to greatly speed up user experience. For example, you might take a picture of an animal that an algorithm recognizes as a cat. Running this computation on your device instead of on a server saves time and improves user experience.

Using neural networks on devices vastly minimizes the use of memory, but device computing requires specialized software, specifically a deep learning framework. Deep learning frameworks implement all parts of the neural network and use the device’s hardware to train the model and perform inference. There are several deep learning model frameworks for mobile devices, the most common being TensorFlow Lite, PyTorch Mobile, and CoreML.

TensorFlow Lite is a multi-platform framework for on-device machine learning. It can be used on Android, iOS, Linux, and microcontrollers. TensorFlow Lite provides such features as compression during model conversion from TensorFlow to TensorFlow Lite format and GPU/NPU acceleration support. One of the drawbacks of this framework is that on-device training is unsupported, so only model inference is available for now.

PyTorch Mobile is another multiplatform framework to work with PyTorch models on Android, iOS, and Linux. Though this tool is in the beta stage, it already features an 8-bit quantization during conversion from the PyTorch to PyTorch Mobile format and support of the GPU/NPU acceleration.

CoreML is a machine learning framework specific to the iOS platform. It leverages Apple hardware, including CPU, GPU, and NPU to maximize the model performance while minimizing memory and power consumption. It also supports such key features as quantization from 16 to 1 bit and on-device fine-tuning with local data, which can help to personalize application behavior for each user.

As an example of how to implement one of these common frameworks, let’s say we are porting a model only on the iOS platform. In this case, the best choice is CoreML, since it is optimized for Apple’s hardware.

Efficient AI Demo Learn More

Conversion to CoreML format

To begin, you must take your existing model (like TensorFlow or PyTorch) and convert it to the appropriate format in order to use it with CoreML. CoreML exploits .mlmodel file format, which describes neural network architecture, input and output format, and necessary data preprocessing.

Python’s coremltools package can be used to convert your pretrained deep learning model to CoreML format. This package can convert models from such frameworks as TensorFlow, PyTorch, Keras, ONNX, Caffe, scikit-learn, and others, depending on the type of model you need to convert.

In addition to the conversion, Python’s coremltools allows you to change the properties of the converted CoreML model and make an inference using Python, though this is only supported on macOS. In our example, we were converting pretrained PyTorch models to the CoreML format.

For more details on how to convert the PyTorch models to CoreML, please refer to the official documentation. Here is a broad look at the overall conversion process:

  • Set PyTorch model into eval mode
  • Trace it with the data sample
  • Convert it with coremltools package, specifying the input image example and model configurations

It is also possible to add data preprocessing into your CoreML model by providing it with corresponding arguments during the conversion process. For instance, if you are working with visual data, you can scale all images (except channel-wise scaling), subtract some bias, and instruct your model to expect pixels in RGB or BGR format.

In addition to passing arguments to the converter function, it is sometimes necessary to add layers to the existing CoreML model. Take image normalization as an example. To normalize an image, you must subtract the mean value and divide it by the standard deviation. You can subtract the mean by passing means for all 3 RGB channels as biases into the converter. To divide on standard deviation, you can use the scale parameter.

However, since the CoreML model doesn’t support channel-wise scaling, so you need to add this channel-wise scaling layer at the beginning of the model. See an example of adding such a scaling layer for the CoreML model. After conversion, you can insert the CoreML model into your iOS application and use it via CoreML API.

Ready to take the next step and transform your applications with deep neural networks and machine learning? LET’S TALK about how SoftServe experts can help you improve your mobile user experience.

Read the next blog in our mobile AI series to learn more about neural network inference on mobile devices.

Useful resources

PyTorch Mobile for iOS

Example of using TensorFlow Lite on iOS

CoreML Survival Guide Book