Entry to real-time information is more and more essential for a lot of organizations. That is significantly true for Lyft, which wants to reply instantly to adjustments of provide and demand in its market, climate and visitors updates, fraud makes an attempt, and harmful driving conditions. This requires processing tens of millions of occasions per second produced by our microservices, cell apps, and IoT units. Lyft runs dozens of Apache Flink and Apache Beam pipelines. Flink supplies a robust framework that makes it straightforward for non-experts to jot down right, high-scale streaming jobs, whereas Beam extends that energy to Lyft’s giant base of Python programmers. Lyft additionally constructed a real-time SQL engine known as Dryft, primarily utilized by information scientists to energy real-time machine studying fashions, and a near-real-time advert hoc querying system with Presto. Traditionally, Lyft ran its Flink clusters on naked, custom-managed EC2 situations. As a way to obtain better elasticity and reliability, we rebuilt it on prime of Kubernetes. This discuss will cowl how we designed and constructed an open supply Kubernetes operator for Flink and Beam, among the distinctive challenges of working a posh, stateful software on Kubernetes, and the teachings realized alongside the best way.