Convert Byte Array to String in Java
Overview
A byte array in Java is a mutable class that stores values from to . As we know that everything inside the computer is and . All the data is stored in binary format. But on the screen, we see many different characters, texts, images, etc. How is this possible? Character encoding is crucial for turning the byte array into a String. In plain English, byte values are numerical numbers, while character encoding is a map that assigns a certain character to a given byte. In short, we need to convert the binary data into a string to make it meaningful and human-understandable.
Introduction
First, let us understand what a byte array is. Inside a byte array, each byte is an 8-bit data. The continuous collection of 8-bit data can be called a byte array. Every byte inside the byte array is an integer value. As the byte is 8-bit. Then the maximum integer value inside the byte array goes up from 0 to 28-1 (0 to 255). Every integer is mapped with a character.
Simply if we display the integer values on the screen, it will be difficult for humans to extract meanings from them. For that purpose, we need to convert the byte array into a string. Let us understand this by an example. Suppose there is a string x = “Hello World!”. The byte array for the string x will be 72 101 108 108 111 32 87 111 114 108 100 33. Now, if we read these numbers, every character of the string is mapped to its ASCII value. But it will be hard to extract meaning by reading these ASCII values. So, we need to convert the byte array to string.
We will be discussing the two most commonly used techniques to convert byte array to string. They are as follows:
- String class constructor.
- UTF-8 Encoding.
Java Program to Convert Byte[] Array to String Using String Class Constructor
The first technique that we are going to discuss is using a string class constructor. It is one of the simplest techniques. In this method we need to pass the byte[] as an argument to the string class constructor. Following is an example that displays the use of a string class constructor.
Code
Output
Explanation In the above code, we have converted a string “Learn with Scalar” into a bytes array. The byte array will be: 76 101 97 114 110 32 119 105 116 104 32 83 99 97 108 97 114. Then to convert that byte array to string we are using string class constructor. This string class constructor accepts a byte array as an argument and returns a string. In the final line of the code, we are printing the converted string.
Java Program to Convert Byte[] Array to String Using UTF-8 Encoding
UTF-8 is a variable-width character encoding technique. It is used in electronic communication. The name, which derives from Unicode Transformation Format - 8-bit, is specified by the Unicode Standard. One to four one-byte code units can be used by UTF-8 to encode all 1,112,064 valid character code points in Unicode.
The ASCII encoding technique is valid for only 256 characters. A byte array with more than 256 distinct characters cannot be encoded with the ASCII encoding technique. For that reason, we need to use UTF-8 encoding technique.
UTF-8 encoding is the most widely used encoding technique and one of the best practices for specifying character encoding. UTF-8 also handles unsupported encoding exceptions. To implement UTF-8 encoding we need to import Standard Charsets from nio.charstes. While converting the bytes to string using the string class constructor, specify the encoding technique. In this case, we need to write UTF-8 as an optional argument in the string class constructor. Following is an example of UTF-8 encoding.
Code
Output
Explanation In the above code, we have converted a string “Learn with Scalar” into a bytes array using the UTF-8 charset. The byte array will be: 87 101 108 99 111 109 101 32 116 111 32 83 99 97 108 97 114. Then we converted the byte array to a string using the string class constructor. In the string class constructor, the first argument is a bytes array (UTF-8 decoded). The second argument is encoding type. From the Standard Charsets, we have imported UTF-8 encoding. So we get string s after converting a byte array. Finally, we print the string.
Conclusion
- As byte array consists only of numbers. Humans cannot understand the language of numbers so we need to convert the byte array to string.
- ASCII encoding is capable of encoding only 256 distinct characters.
- UTF-8 is the most widely used encoding technique, as it handles unsupported encoding exceptions.