2008-07-29

Memoizing with ETS

In my 3n + 1 post rickhg12hs suggested I use the erlang ets module to memoize my solution.

To time the code before and after changes I found this little gem from David King

-module(timer).
-export([timer/3]).

timer(Mod,Fun,Args) ->
    Start=erlang:now(),
    Result = apply(Mod,Fun,Args),
    Stop=erlang:now(),
    io:format("~p~n", [time_diff(Start,Stop)]),
    Result.

time_diff({A1,A2,A3}, {B1,B2,B3}) ->
    (B1 - A1) * 1000000 + (B2 - A2) + (B3 - A3) / 1000000.0 .

Using this I found that my old code for 1 -> 1,000,000 took 13.1 seconds..

I upgraded the code to use an ets ordered_set like this:

-module(ets3np1).
-export([tnpo_range/2]).tnpo_range(Min, Max) ->
    Tib = ets:new(x, [ordered_set,private]),
    Result = lists:max([tnpo(Tib, X) || X <- lists:seq(Min, Max) ]),
    ets:delete(Tib),
    Result.

tnpo(_, 1) -> 1;
tnpo(Tib, Num) ->
    tnpo( ets:lookup(Tib, Num), Tib, Num ).
tnpo([], Tib, Num) ->
    case( Num rem 2) of
        1 -> NewNum = (Num * 3) + 1;
        0 -> NewNum = Num div 2
    end,
    Val = tnpo(Tib, NewNum) + 1,
    ets:insert(Tib, { Num, Val }),
    Val;

tnpo([{_,Val}], _, _) -> Val.

which ran at the improved time off 8.8 seconds.

Initially I had forgotten to delete the ets tables so my lappy started grinding after a few iterations as it thrashed the swap. Ah the joys of learning a feature via the API reference…

One nagging question I had was the speed of using the lists:seq and the list comprehension. So I swapped to an explicit loop via parameter matching, which ran slightly faster. Luckily I realised the code smell: the original could be mapped across many processors with little effort, but the latter was hard coded looping.

So here’s that bad code:

tnpo_range_b(Min, Max) ->
    Tib = ets:new(x, [ordered_set, private]),
    Result = tnpo_range_b(Tib, Min, Max, 0),
    ets:delete(Tib),
    Result.
tnpo_range_b(Tib, Num, Max, Best ) when Num =< Max ->
    Result = tnpo(Tib, Num),
    tnpo_range_b(Tib, Num + 1, Max, lists:max([Result,Best]));
tnpo_range_b(_, _, _, Best ) -> Best.

After that I then wondered what the difference between set (hash-table) and ordered_set (b-tree), so I altered my code again.

So the final results are (averaged over three runs):

Input 1 -> 1,000,000
Original:  13.1 seconds
Ordered_Set:  8.8 seconds
Bad-Code: 8.7 seconds
Set:  8.9 seconds

Input 1 -> 2,000,000
Original:  27.8 seconds
Ordered_Set:  18.6 seconds
Bad-Code: 18.2 seconds
Set:  18.8 seconds

Of funny coincidence, the day after doing the coding, I read the ets section in Programming Erlang, which turns out to cover the topic nicely.