Data Oriented Finite State Machines

Finite state machines (FSMs) are probably one of the most widespread techniques used for game AI.

For my own engine i thought about ways to implement FSMs with performance in mind. The fundamental assumption guiding the design was: “There will be many AI entities using the same state-machine.”1

The design itself is pretty simple and i’m sure that i’m not the first person to come up with this. The structure for a FSM is shown here:

#define XE_AI_FSM_MAX_STATES 16
#define XE_AI_FSM_STATE_FUNC(name) uint32_t name(xeAIEntity entity, uint32_t previousState)
typedef XE_AI_FSM_STATE_FUNC(xeAIFSMStateFunc);
 
typedef struct
{
    uint32_t maxEntities;
    uint32_t stateCount;
    uint32_t entityCounts[XE_AI_FSM_MAX_STATES];
    xeAIFSMStateEntity* stateEntities[XE_AI_FSM_MAX_STATES];
    xeAIFSMStateEntity* nextStateEntities[XE_AI_FSM_MAX_STATES];
    xeAIFSMStateFunc* stateFuncs[XE_AI_FSM_MAX_STATES];
} xeAIFSM;

I arbitrarily limited the number of states to 16 because this saves me from some memory management, but it would be trivial to remove this limit. For every state (which is identified by an unsigned integer between 0 and stateCount), the FSM saves the AI entities in that state (stateEntities). The type xeAIFSMStateEntity is shown below. It contains an reference to the entity data and the previous state of the entity, which can be used by the state function to determine if the entity just entered the state (for example, if a monster just entered the “chase player” state, we could play a special sound effect).

typedef struct
{
    xeAIEntity entity;
    uint32_t previousState;
} xeAIFSMStateEntity;

Every frame, all state functions need to be executed. For that, we iterate over all states and call the function for that state (stateFuncs[stateIndex]) for every entity in stateEntities[stateIndex]. This can be done in parallel, for example via a job system (which is what i do in my engine). The state function returns the new state of the entity and the run function of the FSM will place the append the entity to nextStateEntities[newState]. After all states were executed, stateEntities and nextStateEntities are swapped.

Code for this function is shown below; i removed all parallelization code for improved clarity.

void xe_ai_fsm_run(xeAIFSM* fsm)
{
    uint32_t nextEntityCounts[XE_AI_FSM_MAX_STATES];
    XE_ZERO_ARRAY(nextEntityCounts, XE_AI_FSM_MAX_STATES);

    for (uint32_t state = 0; state < XE_AI_FSM_MAX_STATES; ++state) {
        xeAIFSMStateFunc* stateFunc = fsm->stateFuncs[state];
        assert(stateFunc);
        uint32_t entityCount = fsm->entityCounts[state];
        xeAIFSMStateEntity* entities = fsm->stateEntities[state];
        for (uint32_t entityIdx = 0; entityIdx < entityCount; ++entityIdx) {
            uint32_t nextState = stateFunc(entities[entityIdx].entity, entities[entityIdx].previousState);
            entities[entityIdx].previousState = state;
            if (nextState >= fsm->stateCount) {
                xe_log_error("FSM %x entity (%u;%u) tried to move to invalid state %u",
                             (uintptr_t)fsm,
                             state, entityIdx,
                             nextState);
                nextState = state;
                assert(!"Tried to move an entity to an invalid state");
            }
            /* Move to (potentially) new state */
            fsm->nextStateEntities[nextState][nextEntityCounts[nextState]] = entities[entityIdx];
            ++nextEntityCounts[nextState];
        }
    }

    /* Swap pointers and update counts */
    for (uint32_t state = 0; state < XE_AI_FSM_MAX_STATES; ++state) {
        xeAIFSMStateEntity* tmp = fsm->stateEntities[state];
        fsm->stateEntities[state] = fsm->nextStateEntities[state];
        fsm->nextStateEntities[state] = tmp;
        fsm->entityCounts[state] = nextEntityCounts[state];
    }
}

“Where there is one, there are many.”, as Mike action would say. ↩