IDA Script: Fixing 16bit pushed data segment references

A good friend has started reversing an old 16bit Borland C++ (3.1?) program, and had lots of stack push data segment offsets that were not correctly cross referencing.

After telling him the shortcuts for manually fixing the issue (press O for the data segment, or Alt-R for any segment offset), he wrote an IDC script to do it en mass.

Thus (made up example code)


push ds;

mov ax, 0x1234;

push ax

should look like:


push ds;

mov ax, ds:dword_1234;

push ax

Here’s his script:

#include <idc.idc>

static main()
{
    auto seg, loc;
    auto movloc, movtarget;
    auto xref;
    auto dsegbase;

    dsegbase = SegByName("dseg") * 16;
    Message("dsegbase=%x\n", dsegbase);

    Message("========================================\n");
    seg = FirstSeg();

    while(seg != BADADDR )
    {
        Message("----------------------------------------\n");

        loc = SegStart(seg);

        if( Byte(loc) != 0xCD || Byte(loc+1) != 0x3F)
        {
            Message("Fixing indirect push [ds:xx] refs from %s\n", SegName(seg));

            while(loc != BADADDR && loc < SegEnd(seg))
            {
                if (GetMnem(loc) != "push" || GetOpnd(loc, 0) != "ds")
                {
                    loc = NextHead(loc, BADADDR);
                    continue;
                }
                loc = NextHead(loc, BADADDR);

                if (GetMnem(loc) != "mov" || GetOpType(loc, 1) != o_imm)
                {
                    loc = NextHead(loc, BADADDR);
                    continue;
                }
                movloc = loc;
                movtarget = GetOpnd(movloc, 0);
                loc = NextHead(loc, BADADDR);

                if (GetMnem(loc) != "push" || GetOpnd(loc, 0) != movtarget)
                {
                    continue;
                }

                // At this point, we know we're pushing a [ds:x] combo.
                //Message("%x: mov %s, %s\n", movloc, movtarget, GetOpnd(movloc, 1));

                // Abort if there already exists a Dxref
                xref = Dfirst(movloc);
                if (xref != BADADDR)
                {
                    continue;
                }

                Message("  Updating %s:%04x\n", SegName(seg), (movloc - seg) & 0xffff);
                OpOff(movloc, 1, dsegbase);
            }
        }

        seg = NextSeg(seg);
    }
}

IDA-Pro and Pascal: Sets & Propogating Types

In Pascal there is the Set object, that you set (n<256) bits and then can later check if bit n is set or not. Sort of like a bool array.

When you decompile a DOS Pascal program the IDA-Pro Flirt signatures will find the Set functions, in this example we will focus Set::MemberOf

arg_0 is the Set object and arg_4 is the byte we are checking to see if it’s set. When this code is called it looks like this:

and the byte_152FE location is an unknown mess like so:

as we know this data is a Set object, it would be nice if it was represented as such. Now we could Declare this a structure varaible (Alt-Q) by hand and then rename it. This works for a few small cases, but in the Gold Box games, Sets are used to manage lots of things so there are too many of them. The best trick here is to get IDA-Pro to do the work for us.

Firstly I assume you have creates a Set structure (needed for the above manual process) that is 0x20 bytes long.

Now go back to Set::MemberOf and Associate a prototype to a function (Y) and change the prototype from:

int __stdcall far Set__MemberOf(__int32 _set);

to:

int __stdcall far Set__MemberOf(Set* set, char);

and tada the code call Set::MemberOf is tidy:

and all the Set data blocks are typed for us also:

Magic!

IDA Script: Fixing overlay jumps

In the DOS Gold Box games they use overlays to manage the ‘more code than memory’ problem of the DOS environment.

So when this code here (seg000:00F6) calls the sub_21979 it goes via a sub function sub_10180

Which jumps to the actual function when it has been loaded into ram (after swapping some other code out and other magic!)

here the actual called function

And IDA Pro links this all together auto-magically so life is good.

But really we want to remove the jump functions out of the loop, as we can have the whole project in memory. The main advantage of cleaning up is that sub_21979 only shows one place the refers to this function (green code in top right of picture), but the jump function may have many callers, and we don’t see that, and to explore the code requires jump in and out of the jump function, which gets annoying.

Here an .idc script to fix this up. It finds all the overlay jump functions, then loops across the referencing locations and rewrite those to call the actual jump target.

#include <idc.idc>

static main()
{
	auto seg, loc;
	auto off, base;
	auto xref;

	seg = FirstSeg();

	while(seg != BADADDR )
	{
		loc = SegStart(seg);

		if( Byte(loc) == 0xCD && Byte(loc+1) == 0x3F)
		{
			Message("Fixing segment %s jumps\n", SegName(seg));

			loc = loc + 0x20;

			while(loc < SegEnd(seg))
			{
				if( Byte(loc) == 0xEA )
				{
					off = Word(loc+1);
					base = Word(loc+3);

					xref = RfirstB(loc);
					while( xref != BADADDR )
					{
						Message("Loc %x ref from %x\n", loc, xref);

						PatchWord(xref+1, off);
						PatchWord(xref+3, base);

						DelCodeXref(xref, loc, 0 );

						xref = RnextB(loc, xref);
					}
				}

				loc = loc + 5;
			}
		}

		seg = NextSeg(seg);
	}
}

And now our original calling function calls the real function

And the jump function has nobody call it, but we leave it there in case some later decoded code does call it…

And our called function correctly refers to the code that calls it

IDA Script: Remove empty auto labels

When working in IDA to reverse games, you can end up with lots of dummy/empty labels, that are auto generated when doing offset work. Here’s my script to remove them.

First how it happens.

You find a value you are interested in setting to a offset

And then you right click and go down the offset menu, and review the choices.

This just created a dummy label on every segment at offset 32h so it could display it to you.

Now you can remove these manually by selecting the line and pressing n then empting the name, and pressing ok.

But that’s painful if you have hundreds of dummy labels. Roll on the power of IDC files, and lets get rid of those.

#include <idc.idc>

static main()
{
	auto seg, loc, flags;
	auto count;

	count = 0;

	seg = FirstSeg();

	while(seg != BADADDR )
	{
		loc = SegStart(seg);
		while( loc < SegEnd(seg) )
		{
			flags = GetFlags(loc);

			// Has a dummy label and no references, and not start of function, remove name
			if( ((flags & ( FF_LABL | FF_REF)) == FF_LABL) & ((flags & FF_FUNC) == 0))
			{
				MakeNameEx(loc, "", 0);
				count ++;
			}

			loc = loc + ItemSize(loc);
		}

		seg = NextSeg(seg);
	}

	Message("Removed %d empty labels\n", count);
}

Now you can safely remove the non referenced auto labels. It leaves functions names, or those that don’t follow the auto label format loc_xxxxxx