PWC #172

PWC #172

Challenge 1 (Prime Partition)

The challenge calls for finding a set of n prime numbers that add up to a given number m

CAVEAT: I have not quite followed the instructions, which require the set of primes to be distinct ("No duplicates"). My solutions also find repeated primes. Thus, the challenge gives the expected prime partition where m=19 and n=3 as: (3,5,11) where 3+5+11=19. My programs also turn up: (3,3,13) and (5,7,7). My discussion below is therefore for the somewhat different problem that allows duplicates. My bad for not reading the instructions carefully enough, but hopefully the somewhat different problem and solution is still interesting.

Though usually with prior challenges, I started in Perl 5, and my Raku programs have mostly been adapted from the Perl 5 version, this time, I started in Raku. 

Raku offers the combination operator, which can be used to efficiently(?) search through all combinations of items, such as an array of prime numbers. I did not use this though in my prime-partition subroutine, since I was also considering solutions with repeated primes. Combinations are sets of distinct items. (I later added a one-liner distinct-prime-partition subroutine using combinations, to formally meet the challenge specification).

My approach is recursive. Consider the problem where n=2. We can loop through each prime number i up to m/2, and check if m-i is prime. If it is, we have found a prime partition in terms of the pair (i, m-i). In Raku, I toss the pair into a BagHash (mutable Set), and I accumulate the prime partitions in an array of such objects. We don't waste our time checking primes that are greater than m/2, since we would have already found their prime partitions via a smaller member of the pair. We can further improve this by noting that for odd m, we can stop checking primes after 2. All later primes i are odd, and therefore, m-i will be even and non-prime for sure.

Now for n=3, I loop through each prime number i up to m/3. For each prime number i, I can retrieve the two-element prime partitions for m-i as an array of BagHashes (which could be empty, in which case we skip it). Since these are sets of two primes that add up to m-i, therefore, if i is dropped into each such set, I have a set of three primes that add up to m, giving a 3-element prime partition. I therefore  use the "add" method to insert i into the relevant BagHash to get the desired 3-element partition.

Similarly, for n=4,5,6... etc. For n greater than 2, go through each prime number i up to m/n, get the prime partitions for m-i with n-1 elements (recursively), drop i into each such prime partition to get the n-element prime-partitions we want, and we are (almost) through.

My strategy does not completely avoid duplicates, so before returning my array of BagHashes, I remove any duplicate BagHashes using Raku's "unique" operator.

The program runs very slowly for n > 3 when m is large, say (m=1000, n=6) for example. (n=2 or 3 is fast, even with large m. For example, (m=110,000, n=3) runs in around 9 seconds). To speed it up a little, I generate a large number of primes at the start of the program rather than calling the prime number generator each time the subroutine runs. This does not make much difference, because the Math::Primesieve library function is fast! Barely any difference in time between reading the prime numbers from an array versus generating them again each time. I also tried caching and retrieving intermediate results using a hash, to reduce time spent on wasteful repeated calulations. But this actually made things slower. The time taken to cache and retrieve is more than the time lost due to repeated calculations. So no cache in my final solution. Here is the link:

Challenge 1 solution in Raku

Before attempting a Perl 5 solution, I wrote a Julia script that tracked my Raku logic. Julia does offer objects like Raku's BagHash, but I prefer to just use its Matlab-style matrix operations. So instead of a bag, my partitions are rows in a matrix, and to remove non-unique rows, we first sort within rows to get the elements in the same order in different rows with the same elements (e.g., two rows with [3,2,5] and [5,2,3] both become [2,3,5]), then sort the rows so that duplicate rows are stacked together (so sorted rows with [2,3,5] in non-adjacent locations say row #1 and row #3, move to be adjacent, say rows #1 and #2), then identify and remove the duplicate rows via a small user-defined function. Unlike Raku, where I used multiple dispatch to write different prime-partition versions for n=2 and n > 2, here I used the less sexy approach of a single function with an if statement. (Julia very much encourages multiple dispatch, but the single function seems easier in this case).

 Challenge 1 solution in Julia 

Julia is much faster than Raku (3 seconds for m=110,000 n=3), and does not choke on the larger m and n (though m=1000, n=6 is still painfully slow).  This is although it is doing more work than the Raku script, since the prime number generator is called within the recursive function rather than storing primes earlier.

Finally, my Perl 5 solution. I used PDL together with the fast Math::Prime::Util::GMP library. With PDL matrices, the solution logic closely followed what I did with Julia.  Perl 5+PDL+GMP is faster than the Raku script (4 seconds for m=110,000 n=3), though not as fast as Julia. Like Raku, it chokes on m=1000, n=6.

Challenge 1 solution in Perl 5+PDL

Perl 5 is quite efficient in dealing with this compute-intensive problem. My Raku solution was slow, but I did not really exploit powerful features such as combinations. I have a lot of looping, which is not the best way to do things in Raku. (PDL also doesn't like loops: the mantra is to use "threading" instead. Julia on the other hand encourages explicit looping and handles loops well).

Perl 5 actually has a ready-made function to find prime partitions (including repeats) via the "forpart" routine in Math::Prime::Util, which iterates through the partitions (of course, this works in Raku too). To specify prime partitions, with m=1000 and n=6, we would do something like:

forpart {print "@_\n" } 1000,{n=>6, prime=>1};

This is of course much faster than any scripted alternative. It still takes around 2 minutes 45 seconds to handle the troublesome m=1000, n=6. This has 2,046,712 different partitions, so it is not surprising that the scripts choke (and that is not particularly astronomical as far as these numbers go, to put it mildly. There is literally no limit, and they blow up fast: combinatorial explosion. One can fairly rapidly reach numbers where it will take millenia to just count the partitions, even using all the world's computing power combined. I don't know: m=1,000,000,000 n=100,000 maybe?)

I thought of some improvements to my programs, but would resist the temptation to try implementing them. Not enough time and I have work commitments.

  1. Tail call optimization might help.
  2. There is probably a more efficient way to do caching than the approach I tried in Raku, which was via a hash.   

Update:  

There is an easy more efficient way to cache in both Perl 5 and Raku via their respective Memoize packages (Julia has a Memoize package too). 

I tried using Memoize for my Perl 5 script (Memoized version not uploaded to github) and saw dramatic improvements for n > 3. The subroutine prime_partition (m=1,000, n=6) ran in around 45 seconds with Memoize compared to over half an hour without (I did not wait for the un-Memoized version to finish). 

The Julia script for (m=1000, n=6) ran in  around 1 minute with Memoize compared to 9 minutes without Memoize. It took longer than Perl 5 because it prints out the entire 2 million plus rows, while PDL just prints one line saying that the matrix is too long to print. 

I could not get Memoize to work with my Raku script.

 Challenge 2

Challenge 2 asks for a five-number statistical summary of an array (minimum, 1st quartile, median, 3rd quartile, maximum).

This is a standard data analytics task and easy to do using ready-made functions and libraries. It is also easy to just index the array, and then mark off the indices corresponding to the desired quantiles. This is the approach I followed in Perl 5 using PDL with a large array of around 900,000 random numbers.

Challenge 2 solution in Perl 5+PDL

With Raku, the Statistics package has a built-in five-number-summary function "fivenum", so there is nothing to do. I just illustrate the calculation, using a large array of random numbers.

Challenge 2 solution in Raku

Finally, for benchmarking, I also write a script in Julia that uses the built-in quantile function.

Challenge 2 solution in Julia

Speed-wise, my Perl 5+PDL script is faster than my Julia script for this task because Julia is slower to start up. My Raku script is slow with creating the array of random numbers.

Other people's solutions

With a Perl challenge, a lot of the fun and challenge comes from reading other people's solutions (not with a view to cheat of course). One sees many different creative problem-solving approaches, and reading Perl is a challenging exercise in itself. Again, a very nice regex-based approach to Challenge 1 from Philippe Bricout alias brxfork here. He has provided an explanation. Amazing what one can do with regexes. 

Challenge 1 solutions in Perl 5 from E Choroba and Dave Jacoby have a recursive logic similar to mine, i.e., starting with n=2 (of course, they find partitions of distinct primes). Both have implemented tail recursion in somewhat different ways, and Choroba has also used Memoize for caching. I tried Choroba's script with several large number pairs and it was always fast, completing in less than 1/10s each time. It just gives the first match found though, so is not comparable to programs that give the whole set of partitions. For example, Choroba's prime partition for (m=110,000, n=15) runs in 0.03 seconds, and gives this answer: 2 3 5 7 11 13 17 19 23 29 31 37 41 89 109673. Jacoby's script which gives all combinations is of course slower, even with Memoize added. It took around 17 seconds with Memoize added to run (m=1000, n=4) (I did not complete running the troublesome m=1000, n=6 as it was taking too long). Jacoby's slower speed compared to mine despite his better tail recursive design is because he uses a home-made is_prime function, and does not rely on the fast GMP-based package or use piddles. 

Also catching my eye is this Raku challenge-1 solution from Polgar Marton alias 2colours. Tail recursion Raku-style.


Comments

Popular posts from this blog

PWC 215

PWC 227

PWC 234