String comparison with SSE4.2
Is The Flash? nop, are SIMD instructions! - Date: 01/02/2016

The story tale
Hello ladies and gentlemen, Royal readers of my gitbook! No more jokes, so I wrote this post in English. Consequently, In last week following search algorithms, as a try to gain some performance at my private projects, I view something about âSSE4.2â. So when I view the possibility of using âxmm0âł(a register of 128 bits), thinking âoh my god! I want to use it! This is awesome!â some days studying it with my friend JoĂŁo Victorino aka âPl4kt0nâ, After studying the concepts around SSE4.2, I ended up writing a program. Relax folks! I donât have a karate trick at this point!
The benchmark
To explain, I make two functions, one with the simple function âstrcmp()â, the other with my implementation using SSE4.2 with Assembly ( i change AT&T to Intel syntax(âAT&Tâ is very boring ), for the reason that I guess easy to follow examples of the manual intelâs manualâ), the other fact, I test my âstrcmp()â function with âarray of wordsâ, to carry some results like âCPU cyclesâ to make the benchmark, so with it, we have some conditions to compare, just a cartesian choice to view and compare like a simple plot bar with âGnuplotâ. You can view the results here! and gnuplot cmd here!

So there is no trick. The generic condition results in a typical result, then follow another way to find an uncommon impact. This code doesnât have a scheme.
pcmpistri
I use the instruction âpcmpistriâ(Packed Compare Implicit LengthStrings, Return Index), and the âmovdquâ(move unaligned double quadword) instruction must be used to transfer data from this into an XMM register. With these instructions, you can make many things around âstringsâ, take a look at the following:
So using it to hook functions 32bit and 64bit versions:
Before hooking it up, we need to check whether or not your machine has SSE4.2 support. There are many ways of doing it. However, for the sake of simplicity, letâs go with the following one:
Look at all the source code here!
Other cool stuff
SSE is very common in image processing, game developers use it too, take a look at the following:
https://software.intel.com/en-us/articles/using-intel-streaming-simd-extensions-and-intel-integrated-performance-primitives-to-accelerate-algorithms Do you like CPU features? look this!

Thank you for reading, cheers!
Last updated