IDA Script: Fixing 16bit pushed data segment references

A good friend has started reversing an old 16bit Borland C++ (3.1?) program, and had lots of stack push data segment offsets that were not correctly cross referencing.

After telling him the shortcuts for manually fixing the issue (press O for the data segment, or Alt-R for any segment offset), he wrote an IDC script to do it en mass.

Thus (made up example code)


push ds;

mov ax, 0x1234;

push ax

should look like:


push ds;

mov ax, ds:dword_1234;

push ax

Here’s his script:

#include <idc.idc>

static main()
{
    auto seg, loc;
    auto movloc, movtarget;
    auto xref;
    auto dsegbase;

    dsegbase = SegByName("dseg") * 16;
    Message("dsegbase=%x\n", dsegbase);

    Message("========================================\n");
    seg = FirstSeg();

    while(seg != BADADDR )
    {
        Message("----------------------------------------\n");

        loc = SegStart(seg);

        if( Byte(loc) != 0xCD || Byte(loc+1) != 0x3F)
        {
            Message("Fixing indirect push [ds:xx] refs from %s\n", SegName(seg));

            while(loc != BADADDR && loc < SegEnd(seg))
            {
                if (GetMnem(loc) != "push" || GetOpnd(loc, 0) != "ds")
                {
                    loc = NextHead(loc, BADADDR);
                    continue;
                }
                loc = NextHead(loc, BADADDR);

                if (GetMnem(loc) != "mov" || GetOpType(loc, 1) != o_imm)
                {
                    loc = NextHead(loc, BADADDR);
                    continue;
                }
                movloc = loc;
                movtarget = GetOpnd(movloc, 0);
                loc = NextHead(loc, BADADDR);

                if (GetMnem(loc) != "push" || GetOpnd(loc, 0) != movtarget)
                {
                    continue;
                }

                // At this point, we know we're pushing a [ds:x] combo.
                //Message("%x: mov %s, %s\n", movloc, movtarget, GetOpnd(movloc, 1));

                // Abort if there already exists a Dxref
                xref = Dfirst(movloc);
                if (xref != BADADDR)
                {
                    continue;
                }

                Message("  Updating %s:%04x\n", SegName(seg), (movloc - seg) & 0xffff);
                OpOff(movloc, 1, dsegbase);
            }
        }

        seg = NextSeg(seg);
    }
}

IDA-Pro and Pascal: Sets & Propogating Types

In Pascal there is the Set object, that you set (n<256) bits and then can later check if bit n is set or not. Sort of like a bool array.

When you decompile a DOS Pascal program the IDA-Pro Flirt signatures will find the Set functions, in this example we will focus Set::MemberOf

arg_0 is the Set object and arg_4 is the byte we are checking to see if it’s set. When this code is called it looks like this:

and the byte_152FE location is an unknown mess like so:

as we know this data is a Set object, it would be nice if it was represented as such. Now we could Declare this a structure varaible (Alt-Q) by hand and then rename it. This works for a few small cases, but in the Gold Box games, Sets are used to manage lots of things so there are too many of them. The best trick here is to get IDA-Pro to do the work for us.

Firstly I assume you have creates a Set structure (needed for the above manual process) that is 0x20 bytes long.

Now go back to Set::MemberOf and Associate a prototype to a function (Y) and change the prototype from:

int __stdcall far Set__MemberOf(__int32 _set);

to:

int __stdcall far Set__MemberOf(Set* set, char);

and tada the code call Set::MemberOf is tidy:

and all the Set data blocks are typed for us also:

Magic!

IDA Pro and Pascal: base one arrays

Today I have finally solved how to handle Pascals base one arrays in IDA Pro.

So if you have a fixed size array block, it will normally be packed after some other data.

you can see the stru_1DA79 is an fixed size array from it’s use.

But when the base-1 array is indexed into, the results are messy and confusing

Yes it looks like the dword is being accessed not the actual array. For a long time I have worked around this with mega ugly repeat comments like:

dword and [+2] = unk_1DA79[i-1].byte_0 and [+3] = unk_1DA79[i-1].byte_1

Today I read enough help to finally workout how to do it correctly.

The first steps are to see above that the structure is 3 bytes wide, and create a structure for that (already done in the snaps above thus struct_6). Then in the incorrect usage shown above @ ovr032:0B51 select dword_1DA74 then Offset (User Defined)

Then set the Target delta to -3  (-1 * the size of structure (3))

and like magic it shows you correctly accessing the array

This ‘issue’ has only been the bane of my reverse engineering for like the last ten years.

IDA Script: Fixing overlay jumps

In the DOS Gold Box games they use overlays to manage the ‘more code than memory’ problem of the DOS environment.

So when this code here (seg000:00F6) calls the sub_21979 it goes via a sub function sub_10180

Which jumps to the actual function when it has been loaded into ram (after swapping some other code out and other magic!)

here the actual called function

And IDA Pro links this all together auto-magically so life is good.

But really we want to remove the jump functions out of the loop, as we can have the whole project in memory. The main advantage of cleaning up is that sub_21979 only shows one place the refers to this function (green code in top right of picture), but the jump function may have many callers, and we don’t see that, and to explore the code requires jump in and out of the jump function, which gets annoying.

Here an .idc script to fix this up. It finds all the overlay jump functions, then loops across the referencing locations and rewrite those to call the actual jump target.

#include <idc.idc>

static main()
{
	auto seg, loc;
	auto off, base;
	auto xref;

	seg = FirstSeg();

	while(seg != BADADDR )
	{
		loc = SegStart(seg);

		if( Byte(loc) == 0xCD && Byte(loc+1) == 0x3F)
		{
			Message("Fixing segment %s jumps\n", SegName(seg));

			loc = loc + 0x20;

			while(loc < SegEnd(seg))
			{
				if( Byte(loc) == 0xEA )
				{
					off = Word(loc+1);
					base = Word(loc+3);

					xref = RfirstB(loc);
					while( xref != BADADDR )
					{
						Message("Loc %x ref from %x\n", loc, xref);

						PatchWord(xref+1, off);
						PatchWord(xref+3, base);

						DelCodeXref(xref, loc, 0 );

						xref = RnextB(loc, xref);
					}
				}

				loc = loc + 5;
			}
		}

		seg = NextSeg(seg);
	}
}

And now our original calling function calls the real function

And the jump function has nobody call it, but we leave it there in case some later decoded code does call it…

And our called function correctly refers to the code that calls it

IDA Script: Remove empty auto labels

When working in IDA to reverse games, you can end up with lots of dummy/empty labels, that are auto generated when doing offset work. Here’s my script to remove them.

First how it happens.

You find a value you are interested in setting to a offset

And then you right click and go down the offset menu, and review the choices.

This just created a dummy label on every segment at offset 32h so it could display it to you.

Now you can remove these manually by selecting the line and pressing n then empting the name, and pressing ok.

But that’s painful if you have hundreds of dummy labels. Roll on the power of IDC files, and lets get rid of those.

#include <idc.idc>

static main()
{
	auto seg, loc, flags;
	auto count;

	count = 0;

	seg = FirstSeg();

	while(seg != BADADDR )
	{
		loc = SegStart(seg);
		while( loc < SegEnd(seg) )
		{
			flags = GetFlags(loc);

			// Has a dummy label and no references, and not start of function, remove name
			if( ((flags & ( FF_LABL | FF_REF)) == FF_LABL) & ((flags & FF_FUNC) == 0))
			{
				MakeNameEx(loc, "", 0);
				count ++;
			}

			loc = loc + ItemSize(loc);
		}

		seg = NextSeg(seg);
	}

	Message("Removed %d empty labels\n", count);
}

Now you can safely remove the non referenced auto labels. It leaves functions names, or those that don’t follow the auto label format loc_xxxxxx