Real programs fail, and they fail in unpredictable ways. Akka, and the Scala community in general, favors planning explicitly for failure rather than trying to write infallible applications. A fault tolerant system is a system that can continue to operate when one or more of its components fails. The failure of an individual subsystem does not necessarily mean the failure of the application. How does this apply to Akka?
The actor model provides a natural unit to encapsulate failure: the actor. When an actor throws an exception while processing a message, the default behavior is for the actor to restart, but the exception does not leak out and affect the rest of the system. For instance, let's introduce an arbitrary failure in the response interpreter. We will modify the receive
method to throw an exception when it is asked to interpret the response for misto
, one of Martin Odersky's followers:
// ResponseInterpreter.scala def receive = { case InterpretResponse("misto", r) => throw new IllegalStateException("custom error") case InterpretResponse(login, r) => interpret(login, r) }
If you rerun the code through SBT, you will notice that an error gets logged. The program does not crash, however. It just continues as normal:
[ERROR] [11/07/2015 12:05:58.938] [GithubFetcher-akka.actor.default-dispatcher-2] [akka://GithubFetcher/user/$a/$b] custom error java.lang.IllegalStateException: custom error at ResponseInterpreter$ ... [INFO] [11/07/2015 12:05:59.117] [GithubFetcher-akka.actor.default-dispatcher-2] [akka://GithubFetcher/user/$a] Pushing samfoo onto queue
None of the followers of misto
will get added to the queue: he never made it past the ResponseInterpreter
stage. Let's step through what happens when the exception gets thrown:
InterpretResponse("misto", ...)
message. This causes it to throw an exception and it dies. None of the other actors are affected by the exception.ActorRef
as the deceased actor. This means that, as far as the rest of the system is concerned, nothing has changed.ActorRef
rather than the actor, so the new response interpreter will have the same mailbox as its predecessor, without the offending message.Thus, if, for whatever reason, our crawler crashes when fetching or parsing the response for a user, the application will be minimally affected—we will just not fetch this user's followers.
Any internal state that an actor carries is lost when it restarts. Thus, if, for instance, the fetcher manager died, we would lose the current value of the queue and visited users. The risks associated with losing the internal state can be mitigated by the following: