Archive for the ‘Programming’ Category

Evils of typedefs and legacy code

Monday, September 1st, 2008

Our legacy C++ codebase once was a C codebase.  Way back then, there was no standard boolean type.  So in the wisdom of the day a type was declared

typedef int tBool;

The major problem with tBool’s is that over time enums, #define or magic constants get assigned to them, or they can be compared to said enums, #defines or magic constants, as they are all int’s.

Which works, but years later *somebody* came along and replaced the tBool for bool, and we started getting issues.

Now as in a lot of large codebases there tends to be a large amount of ignored warnings, and not enough daylight hours to remove them all.

So given that, and that your using Visual Studio 2005, here are the ones you should find and remove.

#define NON_BOOL_DEFINE 42
bool boolValue;

boolValue = NON_BOOL_DEFINE;

Warning C4305: ‘=’ : truncation from ‘int’ to ‘bool’

if( boolValue == NON_BOOL_DEFINE )

Warning C4806: ‘==’ : unsafe operation: no value of type ‘bool’ promoted to type ‘int’ can equal the given constant

Another hint your boolean types are being used wrong is this warning

Warning C4800: ‘const int’ : forcing value to bool ‘true’ or ‘false’ (performance warning)

Slow sub-select maybe LAG or LEAD might help

Thursday, August 28th, 2008

One of the queries in our application is used to edit event entries.  Event entries have a start time and an end time, but when they are processed from the data files, they can be entered with just a start time (ie when the event started), but in cases of problems (crashes) these entries will not have matching entries with end time.

So in our editor, we are wanting to know the upper limit of when the event could have ended, which is the next start time.

ROWID TIME_START TIME_END OTHER_FIELDS
1 2008-1-1 null data
2 2008-1-2 2008-1-3 more data

So for row 1 we want 2008-1-2 as the suggested end time.

Originally we had a query that looked like this:

SELECT other_fields,
  TIME_START,
  TIME_END,
  NVL((SELECT MIN(TIME_START)          
         FROM some_view
         WHERE MACHINE_ID = T1.MACHINE_ID
           AND TIME_START > T1.TIME_START),
       TIME_START + 1/24) "NEXT_TIME_START"
  FROM some_view T1
  WHERE MACHINE_ID = machine_id
    AND NVL(TIME_END,
            NVL((SELECT MIN(TIME_START)
                   FROM some_view WHERE MACHINE_ID = T1.MACHINE_ID
                   AND TIME_START > T1.TIME_START),
                TIME_START + 1/24)) > start_time_range
    AND TIME_START < end_time_range
  ORDER BY TIME_START

So this is a little trickier than described above as it deals with the edge case of when there is no next entry.  In that case the current start time plus 1 hour is used. I coloured all the optional/variables yellow.

This performs fine with small data sets, but when we had over a million rows in that view, the selection of one days data (~2700 rows) took 551 seconds (>9 minutes), and this was one of many large views, so the overall effect was a >30 minute wait.

I rewrote the query with the use of LEAD, and the performance went to 4 seconds (with data retrieval) or 0.6 seconds for a count(*) operation.  Here’s that query now using LEAD.

SELECT *
    FROM(SELECT other_fields,
           TIME_START,
           TIME_END,
           NVL( LEAD(TIME_START) OVER (PARTITION BY machine_id ORDER BY time_start),
                TIME_START + 1/24) AS "NEXT_TIME_START",
           FROM some_view T1
           WHERE MACHINE_ID = machine_id )
    WHERE NEXT_TIME_START > start_time_range
      AND TIME_START < end_time_range
    ORDER BY TIME_START

Now that I review the code I notice I also rearranged the code to not repeat the next_time_start calculation, so I am now not sure all the performance improvements can be attributed to LEAD, but I’ll take that ~138 times improvement either way.

Oracle’s Lag in MS SQL Server 2005

Monday, August 11th, 2008

I am currently porting our new database from Oracle 10g to MS SQL Server 2005, and I have it all done except the views that use the Oracle LAG and LEAD functions (non-ANSI).

What these functions provide (for the MS SQL camp) is the ability to get the next or previous rows when sorted. In my case I have a value that is the ‘volume change since the start’ at time intervals, and I want the relative change between each interval.

So PL/SQL of the view is:

SELECT t.*,
    t.volume - LAG(volume) OVER (PARTITION BY group_number 
                                 ORDER BY timestamp) AS volume_change
FROM volume_table t;

The partition clause splits the data into different buckets, then each bucket is sorted, with all results returned.

Asking on the NZ .Net User Group mailing list I got a pointer to this MS feedback page, but the solution presented there gives me an error “Incorrect syntax near ‘ROWS’.” when I run this query against SQL 2K5

SELECT MIN(volume) OVER(PARTITION BY group_number 
                        ORDER BY timestamp
                        ROWS BETWEEN 1 PRECEDING 
                        AND 1 PRECEDING) change
FROM volume_table;
GO

I had a side point showing why I wanted to avoid sub-select, as the performance of a different query had an orders of magnitude improvement from changing to using a LAG function, yet that same sub-select query runs just as fast as the “improved” Oracle statement in MS SQL Server, so I’ll just stick to the main topic, and post about that another day…

Chris recently showed how to use Common Table Expressions (CTE) (sort of auto-magic temp table) to find the first entries for a day, which is very close to what I was want, but the filtering is hard coded.  I could not see how to make it dynamic, so I used the idea, and started massaging the concept, till I finally got what I wanted.

Conceptually the Oracle solution could be done using cursors under the hood to provide the rolling previous (LAG) rows, where-as here I’m doing many look-ups but the table is not getting re-created as in the nested select method.

So my code is as follows:

WITH Rows( vol_diff, time, rn, gn ) AS
(
    SELECT v.volume,
        v.timestamp,
        Row_Number() OVER (PARTITION BY group_number
                           ORDER BY timestamp),
        group_number
    FROM volume_table v
),
PrevRows( timestamp, prev_vol, group_number) AS
(
    SELECT a.time, b.vol_diff, a.gn
    FROM Rows a
    LEFT JOIN Rows b 
        ON a.rn = b.rn + 1 
        AND a.gn = b.gn
)
SELECT v.*, v.volume - p.prev_vol as volume_change
FROM volume_table v
LEFT JOIN PrevRows p
    ON v.timestamp = p.timestamp
    AND v.group_number = p.group_number;
GO

So I use two CTE tables, one to partition and sort the data, the second to do a lag based join, then I can select the lagged based data, by matching the time and group to the current entry.It works a treat, and I will do some performance testing tomorrow once my production data has finished loading into my db.

After the results of the not discussed query I expect that the sub-select will be just as performant.

Teaching the kids Logo

Wednesday, August 6th, 2008

After yesterdays post about when I learnt to program, I realised my children are older than I was, and have yet to learn the art!  While sharing this with Michaela, I decided Logo was the trick.

It’s visual, so your see your creations, yet accessible/simple.  It exposes concepts like iteration, sequential steps and variables.

XLogo screenshotAfter some searching for a native Mac logo I found XLogo.

I spent a few hours playing, relearning the syntax and making squiggly line shapes.

Really looking forward to introducing the children to it in the weekend.

How I got into programming (meme)

Tuesday, August 5th, 2008

It’s been funny watching this meme slowly traverse my blog roll, and now I’ve been tagged. Cheers Chris

How old were you when you first started programming?

Most likely 8 or 9.

How did you get started in programming?
Roland goes diggingMy father brought an Amstrad CPC464 (green screen) to do his thesis on.  Between ‘Roland goes digging’ and other games, making my own stuff became a fascination.

What was your first language?

Basic and Logo, lots of exploration with trivial stuff, also lots of entering game listings from Amstrad Computer User, in basic and/or hex.  I can’t believe the hours we spent reading in hex, and entering line by line, double/triple checking each one.  We never did get Splat! working….

What was your first real program you wrote?

Real? they really were all real, just not really useful.

Um, outside the demo-scene type graphics stuff, mode-x, ray tracing, phong shading stuff, I’m not sure when I wrote anything of outside value.  During high-school I hacked lots of games, to run, or fiddled the save games to give me lots of money, or better stats.  I wrote a telnet proxy at Uni to get free mudding

The first paid ‘programming’ I did was hacking the POS system at the pizza shop I worked at to change the inventory list, to avoid paying the developers consulting rates…

What languages have you used since you started programming?

To make new stuff:

Loco Basic, Logo, MS QuickBasic, Turbo C, Assembly (x86, 68k, PowerPC, custom), C, Quake-C, Shell/Scripting (Bash, Expect), C++, C#, Erlang

To alter/edit/fix:

Cobal, Fortran 77, VB6, VB.Net, Delphi

What was your first professional programming gig?

Software tester at Teltrend NZ(became Allied Telesyn Research which became Allied Telesis Labs), writing test tools, and automated testing in C, Bash, and Expect. Lots of braking other people’s stuff.

If you knew then what you know now, would you have started programming?

Heck yes, I love making things work, and I still have yet to create my own self adapting machine (aka Skynet/Terminator)

If there is one thing you learned along the way that you would tell new developers, what would it be?

Learn to break stuff as well as build happy day software. Learn how things can/will go wrong, and at least say ‘we not covering that case’, rather than just be ignorant.

What’s the most fun you’ve ever had… programming?

Hmm,

Mud Mapping Client @ Uni: I spent a few too many hours playing on TFE and wrote a ncursors based mapping client and telnet proxy to help playing.  I loved those large Sparc 5 screens.

3D data stuff @ Motion-Art: Also another fantastic time here, writing scripts for 3D Studio Max 2.0. We had a golf course exported from Autocad, and it took over an 1 hour to load the dxf file. We could not even mesh it with 128MB of RAM and 2×1GB swap disks and 48 hours time. So I wrote scripts to reduce the dataset to manageable volumes. Lots of fun, just making stuff work.

Curse Azure Bonds port @ now: This is a labour of love, and I’ve been working on it (in one form of another) for over 9 years.  I call it my knitting project, because I just pick it up, do a little and put it down for latter.  It is so satisfying un-weaving how the original game was built and worked.

I tag Matthew Owens-Smith, Shannon Smith, and Conor Boyd