- Android Native Development Kit Cookbook
- Feipeng Liu
- 1549字
- 2021-07-27 18:07:26
Manipulating strings in JNI
Strings are somewhat complicated in JNI, mainly because Java strings and C strings are internally different. This recipe will cover the most commonly used JNI string features.
Getting ready
Understanding the basics of encoding is essential to comprehend the differences between Java string and C string. We'll give a brief introduction to Unicode.
According to the Unicode Consortium, the Unicode Standard is defined as follows:
The Unicode Standard is a character coding system designed to support the worldwide interchange, processing, and display of the written texts of the diverse languages and technical disciplines of the modern world. In addition, it supports classical and historical texts of many written languages.
Unicode assigns a unique number for each character it defines, called code point. There are mainly two categories of encoding methods that support the entire Unicode character set, or a subset of it.
The first one is the Unicode Transformation Format (UTF), which encodes a Unicode code point into a variable number of code values. UTF-8, UTF-16, UTF-32, and a few others belong to this category. The numbers 8, 16, and 32 refer to the number of bits in one code value. The second category is the Universal Character Set (UCS) encodings, which encodes a Unicode code point into a single code value. UCS2 and UCS4 belong to this category. The numbers 2 and 4 refer to the number of bytes in one code value.
Note
Unicode defines more characters than what two bytes can possibly represent. Therefore, UCS2 can only represent a subset of Unicode characters. Because Unicode defines fewer characters than what four bytes can represent, multiple code values of UTF-32 are never needed. Therefore, UTF-32 and UCS4 are functionally identical.
Java programming language uses UTF-16 to represent strings. If a character cannot fit in a 16-bit code value, a pair of code values named surrogate pair is used. C strings are simply an array of bytes terminated by a null character. The actual encoding/decoding is pretty much left to the developer and the underlying system. A modified version of UTF-8 is used by JNI to represent strings, including class, field, and method names in the native code. There are two differences between the modified UTF-8 and standard UTF-8. Firstly, the null character is encoded using two bytes. Secondly, only one-byte, two-byte, and three-byte formats of Standard UTF-8 are supported by JNI, while longer formats cannot be recognized properly. JNI uses its own format to represent Unicode that cannot fit into three bytes.
How to do it
The following steps show you how to create a sample Android project that illustrates string manipulation at JNI:
- Create a project named
StringManipulation
. Set the package name ascookbook.chapter2
. Create an activity namedStringManipulationActivity
. Under the project, create a folder namedjni
. Refer to the Loading native libraries and registering native methods recipe in this chapter if you want more detailed instructions. - Create a file named
stringtest.c
under thejni
folder, then implement thepassStringReturnString
method as follows:JNIEXPORT jstring JNICALL Java_cookbook_chapter2_StringManipulationActivity_passStringReturnString(JNIEnv *pEnv, jobject pObj, jstring pStringP){ __android_log_print(ANDROID_LOG_INFO, "native", "print jstring: %s", pStringP); const jbyte *str; jboolean *isCopy; str = (*pEnv)->GetStringUTFChars(pEnv, pStringP, isCopy); __android_log_print(ANDROID_LOG_INFO, "native", "print UTF-8 string: %s, %d", str, isCopy); jsize length = (*pEnv)->GetStringUTFLength(pEnv, pStringP); __android_log_print(ANDROID_LOG_INFO, "native", "UTF-8 string length (number of bytes): %d == %d", length, strlen(str)); __android_log_print(ANDROID_LOG_INFO, "native", "UTF-8 string ends with: %d %d", str[length], str[length+1]); (*pEnv)->ReleaseStringUTFChars(pEnv, pStringP, str); char nativeStr[100]; (*pEnv)->GetStringUTFRegion(pEnv, pStringP, 0, length, nativeStr); __android_log_print(ANDROID_LOG_INFO, "native", "jstring converted to UTF-8 string and copied to native buffer: %s", nativeStr); const char* newStr = "hello 安卓"; jstring ret = (*pEnv)->NewStringUTF(pEnv, newStr); jsize newStrLen = (*pEnv)->GetStringUTFLength(pEnv, ret); __android_log_print(ANDROID_LOG_INFO, "native", "UTF-8 string with Chinese characters: %s, string length (number of bytes) %d=%d", newStr, newStrLen, strlen(newStr)); return ret; }
- In the
StringManipulationActivity.java
Java code, add the code to load a native library, declare a native method, and invoke a native method. Refer to downloaded code for the source code details. - Modify the
res/layout/activity_passing_primitive.xml
file according to step 8 of the Loading native libraries and registering native methods recipe in this chapter or the downloaded project code. - Create a file called
Android.mk
under thejni
folder. Refer to step 9 of the Loading native libraries and registering native methods recipe in this chapter or the downloaded code for details. - Start a terminal, go to the
jni
folder, and typendk-build
to build the native library. - Run the project on an Android device or emulator. We should see something similar to the following screenshot:
The following should be seen at the logcat output:
How it works…
This recipe discusses string manipulation at JNI.
- Character encoding: Android uses UTF-8 as its default charset, which is shown in our program by executing the
Charset.defaultCharset().name()
method. This means that the default encoding in the native code is UTF-8. As mentioned before, Java uses the UTF-16 charset. This infers that an encoding conversion is needed when we pass a string from Java to the native code and vice versa. Failing to do so will cause unwanted results. In our example, we tried printingjstring
directly in the native code, but the result was some unrecognizable characters.Fortunately, JNI comes with a few pre-defined functions that do the conversion.
- Java string to native string: When a native method is called with an input parameter of string type, the string received needs to be converted to the native string first. Two JNI functions can be used for different cases.
The first function is
GetStringUTFChars
, which has the following prototype:const jbyte * GetStringUTFChars(JNIEnv *env, jstring string, jboolean *isCopy);
This function converts the Java string into an array of UTF-8 characters. If a new copy of the Java string content is made,
isCopy
is set totrue
when the function returns; otherwiseisCopy
is set to false and the returned pointer points to the same characters as the original Java string.Tip
It is not predictable whether the VM will return a new copy of the Java string. Therefore, we must be careful when converting a large string, as the possible memory allocation and copy may affect the performance and even cause "out of memory" issues. Also note that if
isCopy
is set tofalse
, we cannot modify the returned UTF-8 native string, because it will modify the Java string content and break the immutability property of the Java string.Once we've finished all the operations with the converted native string, we should call
ReleaseStringUTFChars
to inform the VM that we don't need to access the UTF-8 native string anymore. The function has the following prototype, with the second parameter being the Java string and the third parameter being the UTF-8 native string:void ReleaseStringUTFChars(JNIEnv *env, jstring string, const char *utf);
The second function for conversion is
GetStringUTFRegion
, with the following prototype:void GetStringUTFRegion(JNIEnv *env, jstring str, jsize start, jsize len, char *buf);
The
start
andlen
parameters indicate the start position of the Java UTF-16 string and number of UTF-16 characters for conversion. Thebuf
argument points to the location to store the converted native UTF-8 char array.Let's compare the two methods. The first method may or may not require allocation of new memory for the converted UTF-8 string depending on whether VM decides to make a new copy or not, whereas the second method made use of a pre-allocated buffer to store the converted content. In addition, the second method allows us to specify the position and length of the conversion source. Therefore, the following rules can be followed:
- To modify the converted UTF-8 native string, the JNI method
GetStringUTFRegion
should be used - If we only need a substring of the original Java string, and the substring is not large, the
GetStringUTFRegion
should be used - If we're dealing with a large string, and we're not going to modify the converted UTF-8 native string,
GetStringUTFChars
should be used
- To modify the converted UTF-8 native string, the JNI method
- String length: The JNI function
GetStringUTFLength
can be used to get the string length of a UTF-8jstring
. Note that it returns the number of bytes and not the number of UTF-8 characters, as shown in our example. - Native string to Java string: We also need to return string data from the native code to Java code at times. The returned string should be UTF-16 encoded. The JNI function
NewStringUTF
constructs ajstring
from a UTF-8 native string. It has the following prototype:jstring NewStringUTF(JNIEnv *env, const char *bytes);
- Conversion failure:
GetStringUTFChars
andNewStringUTF
require allocation of memory space to store the converted string. If you run out of memory, these methods will throw anOutOfMemoryError
exception and returnNULL
. We'll cover more about exception handling in the Checking errors and handling exceptions in JNI recipe.
There's more…
More about character encoding in JNI: JNI character encoding is much more complicated than what we covered here. Besides UTF-8, it also supports UTF-16 conversion functions. It is also possible to call Java string methods in the native code to encode/decode characters in other formats. Since Android uses UTF-8 as its platform charset, we only cover how to deal with conversions between Java UTF-16 and UTF-8 native string here.
- Instant Node Package Manager
- VMware View Security Essentials
- CockroachDB權威指南
- Clojure for Domain:specific Languages
- 64位匯編語言的編程藝術
- Elastic Stack應用寶典
- Linux網絡程序設計:基于龍芯平臺
- INSTANT CakePHP Starter
- 秒懂設計模式
- Mastering Python High Performance
- 計算機應用基礎實踐教程
- Instant Lucene.NET
- Rust游戲開發實戰
- IoT Projects with Bluetooth Low Energy
- C++ System Programming Cookbook