Bucket sort in Haskell

Question

As a first step in learning Haskell I am solving this problem, which involves finding the R_i^th-ranked numbers, given some input list and many R_i. In an imperative language I would make a zeroed array c of length 200, increment c[h] for each height h, compute cumulative sums of c and then binary search c to determine the height corresponding to each given index. Because max_height is fixed, this has runtime linear in the size of input and bounded memory(*) excluding the input.

Here's my Haskell code:

max_height = 200
count_eq e = foldl (\c f -> if e == f then c + 1 else c) 0
counts heights = map (flip count_eq heights) [0..max_height]
first_gt e l = f l 0 where f (x:xs) i = if x > e then i else f xs (i+1)
solve heights indices = let accum = scanl1 (+) (counts heights) in
  map (flip first_gt accum) (map (subtract 1) indices)

It is correct but slow. I would like to know how to (A) reason about and (B) improve the performance. Also (C) can I achieve the same asymptotic performance as the imperative code?

(*) assuming each c[i] fits in a machine int. I believe the runtime statement holds regardless.

\$\begingroup\$ Welcome to Code Review! Good job on your first question. \$\endgroup\$
– SirPython
Commented Apr 7, 2016 at 23:52 — SirPython, Commented Apr 7, 2016 at 23:52

max taldykin · Accepted Answer · 2016-04-09 10:27:36Z

Maps as arrays

In counts heights you are taversing heights 200 times. It is possible to emulate array with Data.IntMap and do this in one pass:

import Data.List (foldl')
import qualified Data.IntMap.Strict as Map

count = foldl' (\m h -> Map.insertWith (+) h 1 m)
  (Map.fromList [(i,0) | i <- [1..200]])

Note that I'm using strict version of left fold (see here about foldl vs foldl') and strict Map. This means that map of height counts constructed in single pass without thunks.

solve heights = map (\ix -> 1 + length (takeWhile (<ix) counts))
  where
    counts = scanl1 (+) $ Map.elems $ count heights

I'm using 1 + length (takeWhile (<ix) counts) instead of bare recursion in first_gt.

A bit more code is required for binary search:

solve heights = map (maybe 0 snd . (`Map.lookupLT` countsMap))
  where
    counts = scanl1 (+) $ Map.elems $ count heights
    countsMap = Map.fromList $ zip counts [2..]

Vectors as arrays

If log(n) overhead of trees is too much for you, it is possible to use Data.Vector which is for real arrays in Haskell.

import qualified Data.Vector.Unboxed as V
import qualified Data.Vector.Unboxed.Mutable as M
import qualified Data.Vector.Algorithms.Search as S

max_height = 200

solve :: [Int] -> [Int] -> IO [Int]
solve heights indices = do
  v <- M.replicate (max_height+1) 0
  mapM_ (M.modify v (+1)) heights
  counts <- V.unsafeFreeze v >>= V.unsafeThaw . V.scanl1' (+)
  mapM (S.binarySearchL counts) indices

binarySearchL is from vector-algorithms package.

Some utility code to parse input:

main = getContents >>= mapM_ print . map (uncurry solve) . parse

parse :: String -> [([Int], [Int])]
parse
  = pairs
  . map
    ( takeWhile (/= 0) -- drop trailing zeroes
    . map read . words -- convert string to ints
    )
  . tail . lines       -- skip first line


pairs :: [a] -> [(a, a)]
pairs (x:y:xs) = (x, y) : pairs xs
pairs _        = []

Gurkenglas · Accepted Answer · 2016-04-09 15:26:06Z

Lists are not for random access. As far as I know, Vectors are the modern way to have fixed-length random-access listlikes. They will allow you to do the traversing updates you want in one pass.

import qualified Data.Vector as V
counts = V.toList . V.accum (+) (V.replicate 200 0) . map (,1)

first_gt doesn't need a manual implementation:

import Safe
first_gt e = findIndexJust (> e)

And just for fun, this should only traverse the counts result once. (Requires that indices is sorted, though.)

import Control.Monad.Trans.State
solve heights = scanl1 (+) . map length . splitsOnFoo . map (subtract 1) where
  splitsOnFoo = evalState $ traverse (state . span . (<)) $ scanl1 (+) $ counts heights

Stack Exchange Network

Bucket sort in Haskell

2 Answers 2

Maps as arrays

Vectors as arrays

Not the answer you're looking for? Browse other questions tagged
performance
haskell
sorting
or ask your own question.

Hot Network Questions

Bucket sort in Haskell

2 Answers 2

Maps as arrays

Vectors as arrays

Not the answer you're looking for? Browse other questions tagged performancehaskellsorting or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
performance
haskell
sorting
or ask your own question.