What could possibly go wrong?

Failure modes in software development

Tadej Štajner, TVBeat / @tadejtadej / tadej@tvbeat.com


FRI, Ljubljana, 30.10.2014

About me

  • Developer at TVBeat
  • PhD student at Jožef Stefan International Postgraduate School.
  • Solving hard problems with data

Why this topic?

What happens to things that you are working on?

Failure to learn

Rapid feedback



Own your core

part of your system that you spend the most time on


We're not here to just learn.

Summing up - learning

  • Quick iterations
  • Prototypes
  • Own critical parts
  • Measure

Failures in operation

Screencap of video by Ed Sealing

Sytems behaviour

how does your system respond to load?

Source: Wikipedia

Sytems behaviour

is there a cliff?

  • Keep the system in the happy zone
  • Make the system stable by design

Things will happen

hardware failure is a part of normal operation

  • distributed computational systems
  • distributed databases


Summing up - systems

  • Distributed systems
  • The babysitting trap

Failure to communicate


if you use this interface, an implementation can solve your problem

One magic button

Leaky abstractions

Leaky abstraction

Lying abstraction

feature nominally exist, but with ridiculous performance envelope

Summing up - communication & abstraction

  • Fake simplicity
  • Leaky abstractions
  • Lying abstractions

Failure to observe

what axis is your problem hard at?






Summing up

  • Failures to learn
  • Failures to operate
  • Failures to abstract
  • Failures to observe


Tadej Štajner, TVBeat / @tadejtadej / tadej@tvbeat.com

http://tdj.si/failuremodes/ (press S for notes)