IDA Script: Fixing overlay jumps

In the DOS Gold Box games they use overlays to manage the ‘more code than memory’ problem of the DOS environment.

So when this code here (seg000:00F6) calls the sub_21979 it goes via a sub function sub_10180

01 - before call
01 - before call

Which jumps to the actual function when it has been loaded into ram (after swapping some other code out and other magic!)

02 - before jump func
02 - before jump func

here the actual called function

03 - before func
03 - before func

And IDA Pro links this all together auto-magically so life is good.

But really we want to remove the jump functions out of the loop, as we can have the whole project in memory. The main advantage of cleaning up is that sub_21979 only shows one place the refers to this function (green code in top right of picture), but the jump function may have many callers, and we don’t see that, and to explore the code requires jump in and out of the jump function, which gets annoying.

Here an .idc script to fix this up. It finds all the overlay jump functions, then loops across the referencing locations and rewrite those to call the actual jump target.

#include <idc.idc>

static main()
{
  auto seg, loc;
  auto off, base;
  auto xref;

  seg = FirstSeg();

  while(seg != BADADDR )
  {
    loc = SegStart(seg);

    if( Byte(loc) == 0xCD && Byte(loc+1) == 0x3F)
    {
      Message("Fixing segment %s jumps\\n", SegName(seg));

      loc = loc + 0x20;

      while(loc < SegEnd(seg))
      {
        if( Byte(loc) == 0xEA )
        {
          off = Word(loc+1);
          base = Word(loc+3);

          xref = RfirstB(loc);
          while( xref != BADADDR )
          {
            Message("Loc %x ref from %x\\n", loc, xref);

            PatchWord(xref+1, off);
            PatchWord(xref+3, base);

            DelCodeXref(xref, loc, 0 );

            xref = RnextB(loc, xref);
          }
        }

        loc = loc + 5;
      }
    }

    seg = NextSeg(seg);
  }
}

And now our original calling function calls the real function

04 - after call
04 - after call

And the jump function has nobody call it, but we leave it there in case some later decoded code does call it…

05 - after jump func
05 - after jump func

And our called function correctly refers to the code that calls it

06 - after func
06 - after func