In our previous blog, we explored how compressing and optimizing deep learning models can accelerate business operations and improve user experience. Now, we’ll focus on mobile deep learning frameworks — tools that allow computations to run directly on devices instead of in the cloud, speeding up performance and responsiveness.
WHY RUN AI ON DEVICES
Most AI computations happen in the cloud, but executing them on devices such as PCs, smartphones, or edge devices can significantly improve speed and user experience.
For instance, imagine taking a photo of an animal that an algorithm identifies as a cat. Running this computation on your device instead of a server reduces wait times and makes the interaction feel immediate.
HOW AI RUNS ON MOBILE DEVICES
Using neural networks on devices reduces memory usage and improves efficiency, but it requires specialized software — a deep learning framework. These frameworks handle all parts of a neural network, using the device’s hardware to perform inference (making predictions) and, in some cases, training.
Popular mobile frameworks include:
- TensorFlow Lite
- PyTorch Mobile
- CoreML
TensorFlow Lite is a multi-platform framework designed for running machine learning models directly on devices. It works across Android, iOS, Linux, and microcontrollers. TensorFlow Lite includes useful features like model compression during conversion from TensorFlow to TensorFlow Lite and support for GPU and NPU acceleration, which helps speed up performance. One limitation is that it does not currently allow on-device training, so it can only be used for model inference.
PyTorch Mobile is another multi-platform framework that lets you run PyTorch models on Android, iOS, and Linux devices. Although it is still in beta, it already supports 8-bit quantization during the conversion from PyTorch to PyTorch Mobile, as well as GPU and NPU acceleration.
CoreML is an iOS-specific machine learning framework that is deeply integrated with Apple hardware. By leveraging the CPU, GPU, and NPU, CoreML maximizes model performance while minimizing memory usage and power consumption. It also offers features like quantization from 16 to 1 bit and on-device fine-tuning with local data, which enables developers to personalize app behavior for individual users.
EXAMPLE: USING COREML ON IOS
As an example of how to implement one of these frameworks, let’s consider porting a model exclusively to the iOS platform. In this case, CoreML is the ideal choice, as it is fully optimized for Apple hardware.
To get started, you first need to take your existing model, whether from TensorFlow, PyTorch, or another supported framework, and convert it to the CoreML format. CoreML uses the .mlmodel file format, which defines the neural network architecture, input and output specifications, and any necessary data preprocessing.
Python’s coremltools package is the standard way to convert pretrained deep learning models to CoreML. It supports models from TensorFlow, PyTorch, Keras, ONNX, Caffe, scikit-learn, and more, depending on the type of model you need to convert. Beyond just conversion, coremltools also lets you adjust properties of the CoreML model and perform inference directly in Python, although this is only available on macOS. In our example, we focused on converting pretrained PyTorch models to CoreML.
For detailed instructions, you can consult the official documentation on converting PyTorch models to CoreML. Broadly, the conversion process involves three steps:
- Setting the PyTorch model to evaluation mode
- Tracing it with a sample input
- Converting it using the coremltools package while specifying the input example and model configuration
You can also include data preprocessing as part of your CoreML model during conversion. For visual data, this might involve scaling images (except for channel-wise scaling), subtracting a bias, and specifying the pixel format (RGB or BGR) that your model expects.
In addition to passing arguments to the converter function, sometimes additional layers need to be added to the CoreML model. For example, image normalization requires subtracting the mean value and dividing by the standard deviation. You can pass the mean values for all three RGB channels as biases to the converter and use the scale parameter to divide by the standard deviation. However, because CoreML does not support channel-wise scaling, you need to insert a channel-wise scaling layer at the beginning of the model. After the conversion and any necessary adjustments, the CoreML model can be embedded into your iOS application and accessed through the CoreML API for fast, on-device inference. An example of adding such a scaling layer to the CoreML model is available at the link.
WHAT’S NEXT
If you’re ready to take the next step and transform your applications with deep neural networks and machine learning, let’s talk about how SoftServe experts can help you improve mobile user experience.
Read the next blog in our mobile AI series to learn more about neural network inference on mobile devices.