String comparison with SSE4.2

Is The Flash? nop, are SIMD instructions! - Date: 01/02/2016

Art: Kieran Yanner

The story tale

Hello ladies and gentlemen, Royal readers of my gitbook! No more jokes, so I wrote this post in English. Consequently, In last week following search algorithms, as a try to gain some performance at my private projects, I view something about “SSE4.2arrow-up-right“. So when I view the possibility of using “xmm0″(a register of 128 bitsarrow-up-right), thinking “oh my god! I want to use it! This is awesome!” some days studying it with my friend João Victorino aka “Pl4kt0n”arrow-up-right, After studying the concepts around SSE4.2, I ended up writing a program. Relax folks! I don’t have a karate trick at this point!

The benchmark

To explain, I make two functions, one with the simple function “strcmp()”, the other with my implementation using SSE4.2 with Assembly ( i change AT&T to Intel syntax(“AT&T” is very boringarrow-up-right ), for the reason that I guess easy to follow examples of the manual intel’s manual’arrow-up-right), the other fact, I test my “strcmp()” function with “array of words”, to carry some results like “CPU cycles” to make the benchmark, so with it, we have some conditions to compare, just a cartesian choice to view and compare like a simple plot bar with “Gnuplotarrow-up-right“. You can view the results here! and gnuplot cmd herearrow-up-right!

Ok, Mr CoolerVoid, what’s the trick?

So there is no trick. The generic condition results in a typical result, then follow another way to find an uncommon impact. This code doesn’t have a scheme.

pcmpistri

I use the instruction “pcmpistriarrow-up-right”(Packed Compare Implicit LengthStrings, Return Index), and the “movdqu”arrow-up-right(move unaligned double quadword) instruction must be used to transfer data from this into an XMM register. With these instructions, you can make many things around “strings”, take a look at the following:

So using it to hook functions 32bit and 64bit versions:

Before hooking it up, we need to check whether or not your machine has SSE4.2 support. There are many ways of doing it. However, for the sake of simplicity, let’s go with the following one:

Look at all the source code herearrow-up-right!

Other cool stuff

SSE is very common in image processing, game developers use it too, take a look at the following:

https://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithmsarrow-up-right Do you like CPU features? look thisarrow-up-right!

Thank you for reading, cheers!

Last updated