clip.cpp is a runtime based on GGML and llama.cpp that performs inference on the popular CLIP model. It includes examples written in C/C++ and bindings that can be used in a Python package. I found clip.cpp while searching for an efficient method to infer CLIP in an Android app, and also discovered an issue on the repository created by the author suggesting the inclusion of JNI bindings along with an Android sample.
For the Android app to use the C++ routines of clip.cpp, JNI bindings are necessary as they allow JVM to bind Java/Kotlin routines with their native (C/C++) counterparts present in the object-code of the shared libraries. Here are the three main contributions I made to the clip.cpp repository in response to the issue:
JNI bindings for clip.h
An Android app built on top of the JNI bindings that allows users to select an image and enter a textual description to compute their semantic similarity using the CLIP model (stored on the mobile device)
Implementation
1. Setup
First, we clone the clip.cpp repository and the dependent sub-modules,
Next, create a new directory, clip.android in the examples directory which will the house the Android project (demo app). Using Android Studio, create a new Empty Project (Compose) in the examples/clip.android directory.
The default project which gets created will only contain the app module. By navigating to File > New > New Module > Android Library Module in Android Studio, we create another module named clip in the project.
The clip module will contain the JNI bindings and a Java wrapper class that encapsulates the native interfaces we’re going to write. To start with the JNI bindings, create a new directory named cpp in clip/src/main. Add two files in the cpp directory,
clip_android.cpp: JNI bindings
CMakeLists.txt: CMake script that describes the build process of clip_android.cpp
We also need to make sure that Gradle compiles clip_android.cpp when building the module/project. To do this, we modify clip/build.gradle.kts and add the path to the CMakeLists.txt we just created,
Setting the CLIP_NATIVE to Off ensures that the -march=native is not passed to clang (from Android NDK) as it is not supported.
Before we build the clip module, we need to configure the build process by modifying CMakeLists.txt,
In the script above, add_subdirectory() will add the main CMake project i.e. the CMakeLists.txt present in the clip.cpp directory in the build process. This is important as the libraries on which clip_android.cpp will dependent i.e. clip and ggml come from the main clip.cpp project, their compilation being defined in clip.cpp/CMakeLists.txt and ggml/CMakeLists.txt. android and log libraries are made available by linker provided by Android NDK.
Also, we load the clip module as a dependency in the app module, to access the wrapper class we’ll be writing in the clip module. To do so, in app/build.gradle.kts, add the following in dependencies block,
NOTE
In a good production setting, the clip module can be packed as a JAR/AAR and distributed through Maven Central
Click Build > Make Project to build the clip module now.
2. Writing JNI bindings
We’ll try to create bindings for declarations (function prototypes) present in clip.h. We also created a Java class in clip/src/main/java/android/example/clip named CLIPAndroid.java.
For the following prototype in clip.h,
We create a method in CLIPAndroid.java,
and the corresponding JNI binding in clip_android.cpp,
Similarly, we go on to write bindings for the following functions in clip.h,
clip_free
clip_image_encode
clip_image_batch_encode
clip_text_encode
clip_text_batch_encode
clip_get_vision_hparams
clip_get_text_hparams
For the last two functions, clip_get_vision_hparams and clip_get_text_hparams, we return Java objects instantiated at their bindings in clip_android.cpp,
The Java classes in CLIPAndroid.java are,
To pass an image from Java to the binding, we use a java.nio.ByteBuffer pass it to the corresponding native method along with the width and height as jint. Then in clip_android.cpp, we use env -> GetDirectBufferAddress(img_buffer) to get a pointer to the buffer’s data.
In the snippet above, first, we create an instance of clip_image_u8 using the provided buffer’s data. Next, we preprocess the instance using clip_image_preprocess which yield a clip_image_f32 instance. This instance of clip_image_f32 is then passed to clip_image_encode to get the embedding.
3. Writing the Java wrapper class
CLIPAndroid.java is our wrapper class which (till now) has contained the native methods connecting to our JNI binding. But these methods work with clip_ctx* represented as a long in Java, which isn’t good with regards to abstraction. To abstract the inner details and to provide an intuitive API to the user, we write some helper functions, completing CLIPAndroid.java,
With some more Compose code in app module, specifically in MainActivity and MainActivityViewModel, we should be ready with the demo app.