The Testing Gap

Despite the massive load-testing effort, the system still crashed when it confronted the real world. Two things were missing in our testing.

First, we tested the application the way it was meant to be used. Test scripts would request one URL, wait for the response, and then request another URL that was present on the response page. None of the load-testing scripts tried hitting the same URL, without using cookies, 100 times per second. If they had, we probably would have called the test “unrealistic” and ignored that the servers crashed. Since the site used only cookies for session tracking, not URL rewriting, all of our load test scripts used cookies.

In short, all the test scripts obeyed the rules. It would be like an application tester who only ever clicked buttons in the right order. Testers are really more like that joke that goes around on Twitter every once in a while, “Tester walks into a bar. Orders a beer. Orders 0 beers. Orders 99999 beers. Orders a lizard. Orders -1 beers. Orders a sfdeljknesv.” If you tell testers the “happy path” through the application, that’s the last thing they’ll do. It should be the same with load testing. Add noise, create chaos. Noise and chaos might only bleed away some amount of your capacity, but it might also bring your system down.

Second, the application developers did not build in the kind of safety devices that would cut off bad situations. When something went wrong, the application would keep sending threads into the danger zone. Like a car crash on a foggy freeway, the new request threads would just pile up into the ones that were already broken or hung. We saw this from our very first day of load testing, but we didn’t understand the significance. We thought it was a problem with the test methodology rather than a serious deficiency in the system’s ability to recover from insults.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset