Some companies that offer services expect you to do things their way or take the highway. However, Google expects people to simply adapt the tech company’s suggestions and best practices for their specific context. This is how things are done at Google, but this may not work in your environment.
Today, we’re talking to Liz Fong-Jones, a Senior Staff Site Reliability Engineer (SRE) at Google. Liz works on the Google Cloud Customer Reliability Engineering (CRE) team and enjoys helping people adapt reliability practices in a way that makes sense for their companies.
Some of the highlights of the show include:
Liz figures out an appropriate level of reliability for a service and how a service is engineered to meet that target
Staff SRE involves implementation, and then identifying and solving problems
Google’s CRE team makes sure Google Cloud customers can build seamless services on the Google Cloud Platform (GCP)
Service Level Objectives (SLOs) include error budgets, service level indicators, and key metrics to resolve issues when technology fails
Learn from failures through instant reports and shared post-mortems; be transparent with customers and yourself
GCP: Is it part of Google or not? It’s not a division between old and new.
Perceptions and misunderstandings of how Google does things and how it’s a different environment
Google’s efforts toward customer service and responsiveness to needs
Migrating between different Cloud providers vs. higher level services
How to use Cloud machine learning-based products
GCP needs to focus on usability to maintain a phase of growth
Offer sensible APIs; tear up, turn down, and update in a programmatic fashion
Promotion vs. Different Job: When you’ve learned as much as you can, look for another team to teach something new
What is Cloud and what isn’t? Cloud deployments require SRE to be successful but SREs can work on systems that do not necessarily run in the Cloud.
Google Cloud Platform blog - CRE Life Lessons
Google SRE on YouTube