What could possibly go wrong?

Failure modes in software development

Tadej Štajner, TVBeat / @tadejtadej / tadej@tvbeat.com

#vblatu

FRI, Ljubljana, 30.10.2014

About me

  • Developer at TVBeat
  • PhD student at Jožef Stefan International Postgraduate School.
  • Solving hard problems with data

Why this topic?

What happens to things that you are working on?

Failure to learn

Rapid feedback

http://www.azarask.in/blog/post/the-wrong-problem/

Prototypes

Own your core

part of your system that you spend the most time on

Measurement

We're not here to just learn.

Summing up - learning

  • Quick iterations
  • Prototypes
  • Own critical parts
  • Measure

Failures in operation

Screencap of video by Ed Sealing

Sytems behaviour

how does your system respond to load?

Source: Wikipedia

Sytems behaviour

is there a cliff?

  • Keep the system in the happy zone
  • Make the system stable by design

Things will happen

hardware failure is a part of normal operation

  • distributed computational systems
  • distributed databases

Babysitting

Summing up - systems

  • Distributed systems
  • The babysitting trap

Failure to communicate

Abstractions

if you use this interface, an implementation can solve your problem

One magic button

Leaky abstractions

Leaky abstraction

Lying abstraction

feature nominally exist, but with ridiculous performance envelope

Summing up - communication & abstraction

  • Fake simplicity
  • Leaky abstractions
  • Lying abstractions

Failure to observe

what axis is your problem hard at?

Performance

Scale

Complexity

Reliability

Politics

Summing up

  • Failures to learn
  • Failures to operate
  • Failures to abstract
  • Failures to observe

Thanks

Tadej Štajner, TVBeat / @tadejtadej / tadej@tvbeat.com

http://tdj.si/failuremodes/ (press S for notes)