Implementing state-machines: An evaluation

Posted: Sunday, 2015-10-18 10:15 | Tags: Programming, C++

In the previous article I raised the question on whether -- from a performance standpoint -- one should implement state machines using switch-statements or function pointers.

We saw that switch-statements are compiled to a number of jump/compare instructions that grow at least logarithmically with the number of case-labels. I had to close with the observation that a too contrived example however leads to our compiler actually analyzing our state machine good enough to be able to rewrite our code into unrolled instructions and loops.

In this post I want to discuss a slightly less contrived example that is a (stupid and useless) "parser" for XML data. We will observe that even when the state logic is more complex and transitions depend on unforeseeable input data, the compiler can do smart transformations with switch-statements.

Here goes the switch-implementation:

#include <iostream>

struct SwitchStateMachine {

  enum State {
    OUTSIDE, INSIDE, INSIDE_BEGINTAG, INSIDE_ENDTAG, INSIDE_ARG, ERROR
  };

  SwitchStateMachine() : current_state(0) {
  }

  void run_state(char c) {
    switch(current_state) {
      case OUTSIDE:
        if(c == '<') {
          current_state = INSIDE;
        }
        break;

      case INSIDE:
        if(c == '/') {
          current_state = INSIDE_ENDTAG;
        }
        else {
          current_state = INSIDE_BEGINTAG;
        }
        break;

      case INSIDE_ENDTAG:
        if(c == '>') {
          current_state = OUTSIDE;
        }
        if(c == '<' || c == '/') {
          current_state = ERROR;
        }
        break;

      case INSIDE_BEGINTAG:
        if(c == '>') {
          current_state = OUTSIDE;
        }
        else if(c == '<') {
          current_state = ERROR;
        }
        else if(c == '"') {
          current_state = INSIDE_ARG;
        }
        break;

      case INSIDE_ARG:
        if(c == '"') {
          current_state = INSIDE_BEGINTAG;
        }
        break;

      case ERROR:
        exit(1);
        break;
    }
  }

  int current_state;
};


int main() {
  SwitchStateMachine ss;

  char c;
  while(std::cin.get(c)) {
    ss.run_state(c);
  }

  return 0;
}

As I warned you before, this parser doesn't do much useful. It scans through an XML document and tracks whether it is inside a tag or an argument value within a tag and so on. And of course its probably far from being complete in the sense that it would handle all valid XML documents correctly. But it just happens to be good enough to provide us with a non-trivial finite state machine.

Again, lets compile and disassemble:

0000000000400680 <main>:
    400680:   48 83 ec 18             sub    $0x18,%rsp

begin0:
    400684:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    400689:   bf 40 0d 60 00          mov    $0x600d40,%edi

begin:
  ret = std::cin.get(c);

    40068e:   e8 dd ff ff ff          callq  400670 <_ZNSi3getERc@plt>
    400693:   48 8b 10                mov    (%rax),%rdx
    400696:   48 8b 52 e8             mov    -0x18(%rdx),%rdx

  if(!ret) { goto end; }

    40069a:   f6 44 10 20 05          testb  $0x5,0x20(%rax,%rdx,1)
    40069f:   0f 85 8d 00 00 00       jne    400732 <main+0xb2>

  if(c != '<') { goto begin; }

    4006a5:   0f b6 44 24 0f          movzbl 0xf(%rsp),%eax
    4006aa:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    4006af:   bf 40 0d 60 00          mov    $0x600d40,%edi
    4006b4:   3c 3c                   cmp    $0x3c,%al
    4006b6:   75 d6                   jne    40068e <main+0xe>

  ret = std::cin.get(c);

    4006b8:   e8 b3 ff ff ff          callq  400670 <_ZNSi3getERc@plt>
    4006bd:   48 8b 10                mov    (%rax),%rdx
    4006c0:   48 8b 52 e8             mov    -0x18(%rdx),%rdx

  if(!ret) { goto end; }

    4006c4:   f6 44 10 20 05          testb  $0x5,0x20(%rax,%rdx,1)
    4006c9:   75 67                   jne    400732 <main+0xb2>

  if(c == '/') { goto end_tag; }

    4006cb:   80 7c 24 0f 2f          cmpb   $0x2f,0xf(%rsp)
    4006d0:   74 67                   je     400739 <main+0xb9>
    4006d2:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    4006d7:   bf 40 0d 60 00          mov    $0x600d40,%edi

in_begin_tag:
  ret = std::cin.get(c);

    4006dc:   e8 8f ff ff ff          callq  400670 <_ZNSi3getERc@plt>

  if(!ret) { goto end; }

    4006e1:   48 8b 10                mov    (%rax),%rdx
    4006e4:   48 8b 52 e8             mov    -0x18(%rdx),%rdx
    4006e8:   f6 44 10 20 05          testb  $0x5,0x20(%rax,%rdx,1)
    4006ed:   75 43                   jne    400732 <main+0xb2>

  if(c == '>') { goto begin0; }

    4006ef:   0f b6 44 24 0f          movzbl 0xf(%rsp),%eax
    4006f4:   3c 3e                   cmp    $0x3e,%al
    4006f6:   74 8c                   je     400684 <main+0x4>

  if(c == '<') { goto opening; }

    4006f8:   3c 3c                   cmp    $0x3c,%al
    4006fa:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    4006ff:   bf 40 0d 60 00          mov    $0x600d40,%edi
    400704:   74 6f                   je     400775 <main+0xf5>

  if(c != '"') { goto in_begin_tag; }

    400706:   3c 22                   cmp    $0x22,%al
    400708:   75 d2                   jne    4006dc <main+0x5c>

in_argument:
  ret = std::cin.get(c);

    40070a:   e8 61 ff ff ff          callq  400670 <_ZNSi3getERc@plt>
    40070f:   48 8b 10                mov    (%rax),%rdx
    400712:   48 8b 52 e8             mov    -0x18(%rdx),%rdx

  if(!ret) { goto end; }

    400716:   f6 44 10 20 05          testb  $0x5,0x20(%rax,%rdx,1)
    40071b:   75 15                   jne    400732 <main+0xb2>

  if(c != '"') { goto in_argument; }

    40071d:   0f b6 44 24 0f          movzbl 0xf(%rsp),%eax
    400722:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    400727:   bf 40 0d 60 00          mov    $0x600d40,%edi
    40072c:   3c 22                   cmp    $0x22,%al
    40072e:   75 da                   jne    40070a <main+0x8a>

  goto in_begin_tag

    400730:   eb aa                   jmp    4006dc <main+0x5c>

end:
  return 0;

    400732:   31 c0                   xor    %eax,%eax
    400734:   48 83 c4 18             add    $0x18,%rsp
    400738:   c3                      retq

end_tag:
  ret = std::cin.get(c);

    400739:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    40073e:   bf 40 0d 60 00          mov    $0x600d40,%edi
    400743:   e8 28 ff ff ff          callq  400670 <_ZNSi3getERc@plt>

  if(!ret) { goto end; }

    400748:   48 8b 10                mov    (%rax),%rdx
    40074b:   48 8b 52 e8             mov    -0x18(%rdx),%rdx
    40074f:   f6 44 10 20 05          testb  $0x5,0x20(%rax,%rdx,1)
    400754:   75 dc                   jne    400732 <main+0xb2>

  if(c == '>') { goto begin0; }

    400756:   0f b6 44 24 0f          movzbl 0xf(%rsp),%eax
    40075b:   3c 3e                   cmp    $0x3e,%al
    40075d:   0f 84 21 ff ff ff       je     400684 <main+0x4>

  if(c == '<') { goto error; }

    400763:   3c 3c                   cmp    $0x3c,%al
    400765:   48 8d 74 24 0f          lea    0xf(%rsp),%rsi
    40076a:   bf 40 0d 60 00          mov    $0x600d40,%edi
    40076f:   74 04                   je     400775 <main+0xf5>

  if(c == '/') { goto error; }

    400771:   3c 2f                   cmp    $0x2f,%al
    400773:   75 ce                   jne    400743 <main+0xc3>

error:
  ret = std::cin.get(c);

    400775:   e8 f6 fe ff ff          callq  400670 <_ZNSi3getERc@plt>
    40077a:   48 8b 10                mov    (%rax),%rdx
    40077d:   48 8b 52 e8             mov    -0x18(%rdx),%rdx

  if(!ret) { goto end; }

    400781:   f6 44 10 20 05          testb  $0x5,0x20(%rax,%rdx,1)
    400786:   75 aa                   jne    400732 <main+0xb2>

  return 1;
    400788:   bf 01 00 00 00          mov    $0x1,%edi
    40078d:   e8 8e fe ff ff          callq  400620 <exit@plt>
    400792:   0f 1f 40 00             nopl   0x0(%rax)
    400796:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
    40079d:   00 00 00

I took the liberty to annotate this compiler output with a little pseudo-c++ to see whats going on. The observations are (at least to me) astonishing:

  • This is all the main function, all calls have been inlined
  • The compiler unrolled our state-machine completely, there is no trace of the current_state variable left, its all a number of loops now, that's it!
  • I made a mistake and read in one superfluous character in the error state which may lead to wrongfully return an error code of 0 ;)

Of course I also wrote a function-pointer based solution, and I did a performance-comparison. However due to this awesome optimization, the pointer-based solution behaves a little worse (the compiler has no chance to inline functions that are accessed through a pointer).

So lets conclude:

  • Compilers today are horribly good at optimization and might unroll your state machine into loops such that no trace of your switch statement is left over.
  • If that happens, you can't improve with function pointers, in fact I doubt there is anything you could do better at that point.
  • However, it might not always be the case that the compiler can do this (or finds it useful in terms of performance). One reason might be you do lots of other stuff between states, or disallow inlining for whatever reason.
  • I know I repeat what several C++ gurus have been telling us over and over here when I say: The only way to actually optimize is to take your code and test different options in your setting. Compilers are way too smart and complex beasts to give generalized answers these days.