As an online groceries company, Picnic saw an enormous increase in demand during the start of the Corona crisis. Picnic’s systems suddenly experienced traffic peaks of up to 10 to 20 times the pre-Corona peak traffic. Even though they build their systems for scalability, surges like these exposed unknown challenges. In the Picnic blog, Picnic’s Sander Mak shares what they learned while scaling their systems during Covid-19 times.
It’s mid-March, right after the ‘intelligent lockdown’ in The Netherlands was a fact. The Picnic experience begins with many existing and new customers turning to Picnic for their essentials in these uncertain times.
Customers use the Picnic-app to order groceries. Picnic then fulfills this order by picking the products in fulfillment centers and delivers them to the customers. Picnic doesn’t have infinite stock, vehicles, and picking capacity. That’s why a customer can choose a delivery slot for a certain day and time, as long as Picnic has the capacity available.
Slots are filling up quickly, and customers are eagerly waiting for new slots to become available. New slots are made available at a fixed time every day, and Picnic communicates this to give customers a fair chance to place an order. Of course, this does lead to a big influx of customers around these slot opening times. These moments brought about the toughest scaling challenges.
Optimizing and re-deploying services during a crisis is stressful. Where do you start, and how do you know your changes are actually helping? For Picnic, observability and metrics were key in finding the right spots to tune service implementations.
Picnic’s Sander Mak: “To solve scaling challenges, you need a holistic approach spanning infrastructure and software development. Every improvement described in this post (and many more) was found and implemented over the course of several weeks. During these first weeks, we had daily check-ins where both developers and infrastructure specialists analyzed the current state and discussed upcoming changes. Improvements on both code and infrastructure were closely monitored with custom metrics in Grafana and generic service metrics in New Relic. Every day, our services performed a little bit better, and we could soon prevent downtime like we had in the first days”. Eventually, the traffic peaks smoothened and somewhat subsided, as society adapted to the new reality. The improvements, however, will be with Picnic for a long time.