All posts by Julian Szulc

Post-mortem report on 23-24, October 2014 failure

On 23rd and 24th October, the Allegro platform suffered a failure of a subsystem responsible for asynchronous distributed task processing. The subsystem consists of several daemon processes, Gearman job server, Redis and Oracle databases. The problem affected many areas, e.g. features such as purchasing numerous offers via cart and bulk offer editing (including price list editing) did not work at all. Moreover, it partially failed to send daily newsletter with new offers. Also some parts of internal administration panel were affected.

Julian Szulc

julianszulc

Site Reliability Engineer responsible for Allegro Platform. In Allegro Group since 2011. Software Engineer with over 10 years of experience in building and maintaining complex IT system using C++, Python, PHP and Ruby. OpenWRT and Raspberry Pi enthusiast and amateur photographer.