27

I think that Haskell is a beautiful language, and judging by the benchmarks, its implementations can generate fast code.

However, I am wondering if it is appropriate for long-running applications, or would chasing all the potential laziness-induced leaks, that one might ignore in a short-lived application, prove frustrating?

This Reddit comment echos my concerns:

As soon as you have more than one function calling itself recursively, the heap profile ceases to give you any help pinpointing where the leak is occurring.

(That whole discussion seems insightful and frank)

I am personally interested in high-performance computing, but I guess servers and HPC have this requirement in common.

If Haskell is appropriate for such applications, are there any examples proving this point, that is applications that

  1. need to run for days or weeks, therefore requiring the elimination of all relevant leaks (The time the program spends sleeping or waiting for some underlying C library to return obviously doesn't count)
  2. are non-trivial (If the application is simple, the developer could just guess the source of the leak and attempt various fixes. However, I don't believe this approach scales well. The helpfulness of the heap profile in identifying the source of the leak(s) with multiple [mutually] recursive functions seems to be of particular concern, as per the Reddit discussion above)

If Haskell is not appropriate for such applications, then why?

Update: The Yesod web server framework for Haskell, that was put forth as an example, may have issues with memory. I wonder if anyone tested its memory usage after serving requests continuously for days.

7
  • 2
    Looks a bit the same as whether a system with a garbage collector is appropriate: because of the gc people normally don't destroy objects that are no longer necessary: they count that the gc will find them eventually. But this can result in a large number of heap objects that are only active because a reference is not set to null making all these objects garbage. Commented May 26, 2015 at 22:44
  • 6
    Laziness does not mean space leaks, just as strictness doesn't. There are different techniques for managing both kinds of memory models. How you write your application determines if your application will be able to run for long periods of time. I know Facebook is using Haskell as a middle layer between multiple data stores and some of their frontend services, but I don't know whether those are short lived processes. My guess is that they would need to be long running, so if that's the case you would have a pretty solid example right there.
    – bheklilr
    Commented May 26, 2015 at 22:47
  • @bheklilr: I don't think MaxB is referring to space leaks: Haskell manages memory correctly (or should from a theoretical pov), but it can take ages before dead objects are recycled. Commented May 26, 2015 at 22:49
  • 3
    @MaxB, you can't really "delete all garbage" in gc languages. We're talking about forgetting to set certain references to null, which is quite similar to not evaluating certain expressions because of what they refer to. However, it can indeed be quite difficult to reason about memory in Haskell programs compared to their imperative counterparts. You can design your persistent data structures in a way to guarantee they hold no unevaluated thunks -- if I were writing a largish system I would probably do that. It does limit your expressivity, but also provides a checkpoint for memory usage.
    – luqui
    Commented May 26, 2015 at 22:50
  • 1
    Read this : engineering.imvu.com/2014/03/24/what-its-like-to-use-haskell . It seems that Haskell works pretty well fort long running services but space leaks can be harder to find (though tooling is improving so I don't know how hard it now is).
    – Jedai
    Commented May 27, 2015 at 12:14

6 Answers 6

17

"Space leaks" are semantically identical to any other kind of resource use problem in any language. In strict languages the GC tends to allocate and retain too much data (as structures are strict).

No matter the language you should be doing some "burn in" to look for resource usage over time, and Haskell is no different.

See e.g. xmonad, which runs for months or years at a time. It's a Haskell app, has a tiny heap use, and I tested it by running for weeks or months with profiling on to analyze heap patterns. This gives me confidence that resource use is stable.

Ultimately though, laziness is a red herring here. Use the resource monitoring tools and testing to measure and validate your resource expectations.

3
  • 2
    xmonad has very low complexity (< 1KLOC). It's unclear how chasing leaks by looking at the profiler would scale, and doesn't xmonad sleep 99.9% of the time? (What does your top say?) Is xmonad really the best example of the use of Haskell in this type of application?
    – MWB
    Commented Jun 2, 2015 at 9:54
  • The core of xmonad is < 1K, and is very similar to the core of a webserver. If you have different requirements for "long running" please specify what you mean. Commented Jun 5, 2015 at 11:08
  • 1
    I just clarified the requirements in the question.
    – MWB
    Commented Jun 5, 2015 at 12:18
8

The warp web server proves that Haskell is appropriate for long-running applications.

When Haskell applications have space leaks, it can be difficult to track down the cause, but once the cause is known it's usually trivial to resolve (the hardest fix I've ever had to use was to apply zip [1..] to a list and get the length from the last element instead of using the length function). But space leaks are actually very rare in Haskell programs. It's generally harder to deliberately create a space leak than it is to fix an accidental one.

2
  • The warp web server proves... Are there any busy web sites that use it?
    – MWB
    Commented May 27, 2015 at 7:10
  • 3
    github.com/yesodweb/yesod/wiki/Powered-by-Yesod has an incomplete list of websites which use the Yesod framework (which would be difficult to use with another web server). None seem to be all that busy, but it's hard to tell sometimes. On the other hand: in benchmarks warp handles more requests per second than nginx except on single core servers. On 10 core servers: warp is 5 times faster than nginx. Commented May 27, 2015 at 10:00
8
+50

It is. There are 2 kinds of possible space leaks:

Data on the heap. Here the situation is no different from other languages that use GC. (And for the ones that don't the situation is usually worse - if there is an error, instead of increasing memory usage, the process might touch freed memory or vice versa and just crash badly.)

Unevaluated thunks. Admittedly, one can shoot oneself in the foot, one must of course avoid well-known situations that produce large thunks like foldl (+) 0. But it's not difficult to prevent that, and for other leaks I'd say that it's actually easier than in other languages, when you get used to it.

Either you have a long-running, heavy computation, or a service that responds to requests. If you have a long-running computation, you usually need results immediately as you compute them, which forces their evaluation.

And if you have a service, its state is usually well-contained so it's easy to make sure it's always evaluated at the end of a request. In fact, Haskell makes this easier compared to other languages: In Haskell, you can't have components of your program keep their own internal state. The application's global state is either threaded as arguments in some kind of a main loop, or is stored using IO. And since a good design of an Haskell application limits and localizes IO as much as possible, it again makes the state easy to control.


As another example the Ganeti project (of which I'm a developer) uses several Haskell long-running daemons.

From our experience, memory leaks have been very rare, if we had problems, it was usually with other resources (like file descriptors). The only somewhat recent case I recall was the monitoring daemon leaking memory as thunks in the rare case where it collected data, but nobody looked at them (which would force their evaluation). The fix was rather simple.

6

Most long-running apps are request driven. For example HTTP servers associate all transient data with an HTTP request. After the request ends the data is thrown away. So at least for those kinds of long-running apps any language will not have space leaks. Leak all you want in the context of a single request. As long as you do not create global references to per-request data you will not leak.

All bets are off if you mutate global state. That is to be avoided for many reasons, and it is uncommon in such apps.

4
  • Not a web programmer, but I think most HTTP servers need to retain some information after serving a request: logging, new content (like on this site), items left in stock, etc.
    – MWB
    Commented May 28, 2015 at 2:15
  • @Carsten, so maybe we're implementing a database or some such thing.
    – luqui
    Commented May 28, 2015 at 23:23
  • @luqui ??? well seems I'm the strange kind of kid you don't wanna play with here - so ok I'll shut up already
    – Random Dev
    Commented May 29, 2015 at 4:09
  • I think you're hearing my frustration with defining away the problem in order to self-gratify, which seems to happen when people bring up problems with the language we love...
    – luqui
    Commented May 29, 2015 at 4:13
4

I have a service written in haskell that works for months without any haskell-specific issue. There was a period when it worked 6 months without any attention, but then I restarted it to apply updated. It contains a stateless HTTP API, but also it has statefull websockets interface, so it maintains long living state. Its sources are closed, so I can't provide a link, but it my experience haskell works fine for long-running applications.

Laziness is not an issue for me, but that is because I know how to deal with it. It is not hard, but requires some experience.

Also libraries on hackage have different quality, and keeping dependencies under control is an important thing. I try to avoid dependencies unless they are really necessary, and I inspect most of their code (except a number of widely used packages, most of them are either core libraries or part of Haskell Platform, though I inspect their code too -- just to learn new things.)

Though there are corner cases when GHC (the most widely used implementation) doesn't work well enough. I had issues with GC time when an application maintains a huge (mostly readonly) state in memory (there is a ticket.) Also a lot of stable pointers can be problematic (the ticket, though I never experienced it myself.) Most of the time such the corner cases are easy to avoid by careful design.

Actually application design is the most important thing for long-running applications. Implementation language plays less important role. Probably it is the biggest lesson I leaned the last few years -- software design is very important and it is not too different between languages.

0

The real answer is that yes, Haskell is suited for long-running processes. As with other languages it is up to the developer to find and fix any bugs, including memory leaks.

To do that, and if your program doesn't depend on lazyness, then since GHC 8.0.1 (2016) supports StrictData and Strict language extensions which can tease out space-leaks. (Lazy vs strict is a contentious topic though, so for consult others before you put it in the cabal file :))

cardano-node is an example of a long running large memory footprint program.

As part of work I've two web apps which run months at a time, and are only restarted due to unrelated reasons, e.g updating Let's Encrypt or updates to the operating system. (Both built on top of Warp, so credit probably belongs there)

Not the answer you're looking for? Browse other questions tagged or ask your own question.