Fastest way of converting uppercase to lowercase and lowercase to uppercase in Java

Ask Time：2018-03-09T05:03:10 Author：Jaime Montoya

This is a question about performance. I can convert from uppercase to lowercase and vice versa by using this code:

From lowercase to uppercase:

// Uppercase letters. 
class UpperCase {  
  public static void main(String args[]) { 
    char ch;
    for (int i = 0; i < 10; i++) { 
      ch = (char) ('a' + i);
      System.out.print(ch); 

      // This statement turns off the 6th bit.   
      ch = (char) ((int) ch & 65503); // ch is now uppercase
      System.out.print(ch + " ");  
    } 
  } 
}

From uppercase to lowercase:

// Lowercase letters. 
class LowerCase {  
  public static void main(String args[]) { 
    char ch;
    for (int i = 0; i < 10; i++) { 
      ch = (char) ('A' + i);
      System.out.print(ch);
      ch = (char) ((int) ch | 32); // ch is now lowercase
      System.out.print(ch + " ");  
    } 
  } 
}

I know that Java provides the following methods: .toUpperCase( ) and .toLowerCase( ). Thinking about performance, what is the fastest way to do this conversion, by using bitwise operations the way I showed it in the code above, or by using the .toUpperCase( ) and .toLowerCase( ) methods? Thank you.

Edit 1: Notice how I am using decimal 65503, which is binary ‭1111111111011111‬. I am using 16 bits, not 8. According to the answer currently with more votes at How many bits or bytes are there in a character?:

A Unicode character in UTF-16 encoding is between 16 (2 bytes) and 32 bits (4 bytes), though most of the common characters take 16 bits. This is the encoding used by Windows internally.

The code in my question is assuming UTF-16.

Author:Jaime Montoya，eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/49182390/fastest-way-of-converting-uppercase-to-lowercase-and-lowercase-to-uppercase-in-j

Kayaman :

Yes a method written by you will be slightly faster if you choose to perform the case conversion with a simple bitwise operation, whereas Java's methods have more complex logic to support unicode characters and not just the ASCII charset.\n\nIf you look at String.toLowerCase() you'll notice that there's a lot of logic in there, so if you were working with software that needed to process huge amounts of ASCII only, and nothing else, you might actually see some benefit from using a more direct approach.\n\nBut unless you are writing a program that spends most of its time converting ASCII, you won't be able to notice any difference even with a profiler (and if you are writing that kind of a program...you should look for another job).",

2018-03-08T22:13:28

Jacob G. :

As promised, here are two JMH benchmarks; one comparing Character#toUpperCase to your bitwise method, and the other comparing Character#toLowerCase to your other bitwise method. Note that only characters within the English alphabet were tested.\n\nFirst Benchmark (to uppercase):\n\n@State(Scope.Benchmark)\n@BenchmarkMode(Mode.AverageTime)\n@OutputTimeUnit(TimeUnit.NANOSECONDS)\n@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)\n@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)\n@Fork(3)\npublic class Test {\n\n @Param({\"a\", \"b\", \"c\", \"d\", \"e\", \"f\", \"g\", \"h\", \"i\", \"j\", \"k\", \"l\", \"m\",\n \"n\", \"o\", \"p\", \"q\", \"r\", \"s\", \"t\", \"u\", \"v\", \"w\", \"x\", \"y\", \"z\"})\n public char c;\n\n @Benchmark\n public char toUpperCaseNormal() {\n return Character.toUpperCase(c);\n }\n\n @Benchmark\n public char toUpperCaseBitwise() {\n return (char) (c & 65503);\n }\n}\n\n\nOutput:\n\nBenchmark (c) Mode Cnt Score Error Units\nTest.toUpperCaseNormal a avgt 30 2.447 ± 0.028 ns/op\nTest.toUpperCaseNormal b avgt 30 2.438 ± 0.035 ns/op\nTest.toUpperCaseNormal c avgt 30 2.506 ± 0.083 ns/op\nTest.toUpperCaseNormal d avgt 30 2.411 ± 0.010 ns/op\nTest.toUpperCaseNormal e avgt 30 2.417 ± 0.010 ns/op\nTest.toUpperCaseNormal f avgt 30 2.412 ± 0.005 ns/op\nTest.toUpperCaseNormal g avgt 30 2.410 ± 0.004 ns/op\n\nTest.toUpperCaseBitwise a avgt 30 1.758 ± 0.007 ns/op\nTest.toUpperCaseBitwise b avgt 30 1.789 ± 0.032 ns/op\nTest.toUpperCaseBitwise c avgt 30 1.763 ± 0.005 ns/op\nTest.toUpperCaseBitwise d avgt 30 1.763 ± 0.012 ns/op\nTest.toUpperCaseBitwise e avgt 30 1.757 ± 0.003 ns/op\nTest.toUpperCaseBitwise f avgt 30 1.755 ± 0.003 ns/op\nTest.toUpperCaseBitwise g avgt 30 1.759 ± 0.003 ns/op\n\n\nSecond Benchmark (to lowercase):\n\n@State(Scope.Benchmark)\n@BenchmarkMode(Mode.AverageTime)\n@OutputTimeUnit(TimeUnit.NANOSECONDS)\n@Warmup(iterations = 5, time = 500, timeUnit = TimeUnit.MILLISECONDS)\n@Measurement(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS)\n@Fork(3)\npublic class Test {\n\n @Param({\"A\", \"B\", \"C\", \"D\", \"E\", \"F\", \"G\", \"H\", \"I\", \"J\", \"K\", \"L\", \"M\",\n \"N\", \"O\", \"P\", \"Q\", \"R\", \"S\", \"T\", \"U\", \"V\", \"W\", \"X\", \"Y\", \"Z\"})\n public char c;\n\n @Benchmark\n public char toLowerCaseNormal() {\n return Character.toUpperCase(c);\n }\n\n @Benchmark\n public char toLowerCaseBitwise() {\n return (char) (c | 32);\n }\n}\n\n\nOutput:\n\nBenchmark (c) Mode Cnt Score Error Units\nTest.toLowerCaseNormal A avgt 30 2.084 ± 0.007 ns/op\nTest.toLowerCaseNormal B avgt 30 2.079 ± 0.006 ns/op\nTest.toLowerCaseNormal C avgt 30 2.081 ± 0.005 ns/op\nTest.toLowerCaseNormal D avgt 30 2.083 ± 0.010 ns/op\nTest.toLowerCaseNormal E avgt 30 2.080 ± 0.005 ns/op\nTest.toLowerCaseNormal F avgt 30 2.091 ± 0.020 ns/op\nTest.toLowerCaseNormal G avgt 30 2.116 ± 0.061 ns/op\n\nTest.toLowerCaseBitwise A avgt 30 1.708 ± 0.006 ns/op\nTest.toLowerCaseBitwise B avgt 30 1.705 ± 0.018 ns/op\nTest.toLowerCaseBitwise C avgt 30 1.721 ± 0.022 ns/op\nTest.toLowerCaseBitwise D avgt 30 1.718 ± 0.010 ns/op\nTest.toLowerCaseBitwise E avgt 30 1.706 ± 0.009 ns/op\nTest.toLowerCaseBitwise F avgt 30 1.704 ± 0.004 ns/op\nTest.toLowerCaseBitwise G avgt 30 1.711 ± 0.007 ns/op\n\n\nI've only included a few different letters (even though all were tested), as they are all share similar outputs.\n\nIt's clear that your bitwise methods are faster, mainly due to Character#toUpperCase and Character#toLowerCase performing logical checks (as I had mentioned earlier today in my comment).",

2018-03-09T00:50:46

Karol Dowbecki :

Your code only works for ANSII characters. What about languages where no clear conversion between lowercase and uppercase exists e.g. German ß (please correct me if I'm wrong my German is horrible) or when a letter/symbol is written using multi-byte UTF-8 code point. Correctness comes before performance and the problem is not so simple if you have to handle UTF-8, as evident in String.toLowerCase(Locale) method.",

2018-03-08T21:59:28

Fastest way of converting uppercase to lowercase and lowercase to uppercase in Java