Site Reliability Engineering

Kitap eklendiğinde bana bildir

Bobby Marleyalıntı yaptı7 yıl önce
Finally, use the project management style that suits the project in its current state.
- 1 beğeni
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Bobby Marleyalıntı yaptı7 yıl önce
A matrix of all possible combinations of disasters with plans to address each of these disasters permits you to sleep soundly for at least one night; keeping your recovery plans current and exercised permits you to sleep the other 364 nights of the year.
- 1 beğeni
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Bobby Marleyalıntı yaptı7 yıl önce
The cost of failure is education.
Devin Carraway
- 1 beğeni
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Bobby Marleyalıntı yaptı7 yıl önce
Your monitoring system should address two questions: what’s broken, and why?
- 1 beğeni
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Bobby Marleyalıntı yaptı7 yıl önce
In general, an SRE team is responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their service(s).
- 1 beğeni
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Bobby Marleyalıntı yaptı7 yıl önce
Monitoring should never require a human to interpret any part of the alerting domain.
- 1 beğeni
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Dauren Chapaevalıntı yaptı4 yıl önce
If you remember nothing else from this chapter, keep in mind the sorts of problems that distributed consensus can be used to solve, and the types of problems that can arise when ad hoc methods such as heartbeats are used instead of distributed consensus. Whenever you see leader election, critical shared state, or distributed locking, think about distributed consensus: any lesser approach is a ticking bomb waiting to explode in your systems.
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Evgenia Shuyskayaalıntı yaptı5 yıl önce
We set a lower availability target for YouTube than for our enterprise products because rapid feature development was correspondingly more important.
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Evgenia Shuyskayaalıntı yaptı5 yıl önce
(Indeed, Google SRE’s unofficial motto is “Hope is not a strategy.”)
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin
Evgenia Shuyskayaalıntı yaptı5 yıl önce
he solution to this Chubby scenario is interesting: SRE makes sure that global Chubby meets, but does not significantly exceed, its service level objective. In any given quarter, if a true failure has not dropped availability below the target, a controlled outage will be synthesized by intentionally taking down the system. In this way, we are able to flush out unreasonable dependencies on Chubby shortly after they are added. Doing so forces service owners to reckon with the reality of distributed systems sooner rather than later.
- Beğen
- Yorum
- Paylaş
  Facebook
  Twitter
  Bağlantıyı kopyala
- Bunu bildirin