The fast, the right and the smart

by Colin Ross on January 8, 2009

Working my way through the puzzles in Project Euler is pretty enlightening at times.  I made it to level 1 yesterday to much acclaim and adulation.  Taking this as an opportunity to look back at the sort of  patterns I have been coding threw up a couple of interesting practises which I have been performing.

Manipulating strings to and from numbers

A large number of the problems involving the rearrangement of numbers, for example, finding which pair of two-digit primes when rearranged in ascending order and concatenated give the cube of the difference of the original primes (I just made that up – there may well not be an answer).  In this example, if our two primes were represented as ab and cd then we would need to perform the rearrangement to get ba and dc as well. The tactic I have been adopting for this is judicious use of show and read.  For the example I just gave, the call would something like

reverseNum = read.reverse.show

In order to find all the two-digit numbers my current ploy is to make use of the permutations function in Data.List and apply it to a string of digits.  After a bit of processing, the correct answers pop out. For example, here is the code to return all nine-digit numbers without zeros:

nineDigitsWithStrings = map read $ permutations "123456789"

Its short and clear, and not a lot can go wrong with it. But the use of read just looks a bit ugly. I had a nagging feeling that the right way to do this was to actually do the conversion “properly” by stepping though the list multiplying by ten at each step, to get the same, proper answer. Something like this in fact:

nineDigitsWithNums = map makeNum $ permutations [1..9]
   where
   makeNum ds = makeNum' (reverse ds)
   makeNum' [] = 0
   makeNum' (n:ns) = n + 10 * makeNum' ns

A couple of things to note – firstly, the reverse is required so that  the proper digits get more multiplications.  Second – it appears to be a lot more complicated than the string version, but let’s be honest, there is a lot of complexity hiding within a call to read of which there is none in this version. I had no idea which might be faster so I created a quick test to see if one was clearly more efficient than the other. The results were somewhat surprising – the quick and dirty string-based method was around 20% faster and used about 20% less memory.  So much for doing things the “right” way!  Why this is the case I do not know (yet), but the reverse calls might be the culprit.  Unfortunately, they are somewhat important in getting the right answer.

Dealing with sorted lists

Another pattern I noticed using occurred when dealing with sorted infinite lists.  These keep cropping up – for example the list of primes is a prime (ho-ho!) example.

In this case, I would often want to grab the members of the list smaller than a given number.  Naively, I would write

filter (<n) primes

but this would never complete, due to the infiniteness of the list.  Knowing that the list is sorted, the correct effect can be obtained by a slight tweak:

takeWhile (<n) primes

which completes quite happily. This sort of thinking is also useful when interrogating a list for the existence of a particular element.  Again, a naive approach would be to use

elem n primes

which again doesn’t complete due to the infiniteness of the list.  Adopting the same sort of approach as before, we can re-write this as

(head $ dropWhile (<n) primes) == n

which (once you get your head around what it is actually doing) is simple, and more to the point works.

Having these solutions at your fingertips allows you to avoid the problem of arbitrarily truncating the infinite list, which would be the other option.  It isn’t a good option though – truncate too early and you miss the elements you need.  But truncate too late and you spend far more time than you need to in dealing with the list.  Haskell’s lazy lists are a real boon in this situation.

Bookmark and Share

No related posts.

{ 7 comments… read them below or add one }

Tirpen January 12, 2009 at 8:03 pm

Did you compile the ‘makeNum’-version with -O2 when comparing the speed?
And a way to avoid the reverse would be (unprofiled)

makeNum xs = foldl f 0 xs
where
f a b = 10*a + b

which might (possibly) be faster. And is at least prettier IMHO. :)

And your nineDigitsWithStrings is wrong I think. It doesn’t return all nine digit numbers without zero. It returns all nine digit numbers with no duplicated numbers, for example 112233445 will never occur.
Try with “sequence $ replicate 9 [1..9]”

Other than that, nice post.

Reply

Felipe January 12, 2009 at 9:44 pm

Doesn’t work for [], but that’s easy to change if needed (not in your case):

makeNum = makeNum' 0
makeNum' acc [x] = acc * 10 + x
makeNum' acc (x:xs) = makeNum' (acc * 10 + x) xs

Plus it’s a tail call =).

Reply

Thomas Danecker January 12, 2009 at 9:47 pm

Try this one for makeNum:
makeNum = foldl (\n d -> n*10 + d) 0

Reply

Colin Ross January 12, 2009 at 10:04 pm

@Tirpen

I didn’t do any compilation of this stuff – it is all gleaned by using ghci. I really should compile things more often, as I think the GHC optimisation is pretty good at speeding things up.

folds are a class of function I am still getting to grips with. I am using them more and more, but recognising when they are applicable is still a bit hit and miss for me. I just need to remember that whenever I am repeatedly updating something, a fold could well be the way.

nineDigitsWithStrings is a bit embarassing! As you point out – it is rather wrong. Still, it was just intended to produce a large number of input to chew on.

Reply

Colin Ross January 12, 2009 at 10:06 pm

@Felipe

Another thing I am trying to do semi-automatically – getting tail-calls into my brain so that I use accumulators more often.

One day, one day!

Reply

Colin Ross January 12, 2009 at 10:08 pm

@Thomas

Good old folds. Always the answer…

Reply

Tirpen January 13, 2009 at 9:37 am

Yeah, folds are one of the trickiest tools to understand, but well worth it once you get it. I still have quite a bit of trouble figuring out when to use foldr, foldr’, foldl, foldr1 and so on myself. :)

And comparing compiled code with interpreted code doesn’t really say anything at all, since compiled code can easily be 10x – 20x times faster in many cases when compiled with optimizations on.

Reply

Leave a Comment

Previous post:

Next post: