Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Table of Contents

Real-World SRE

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is Searching for Authors Like You

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

1. Introduction

A brief history

What is SRE?

What is in the book?

SRE as a framework for new projects

Summary

References

2. Monitoring

Why monitoring?

Instrumenting an application

What should we measure?

A short introduction to SLIs, SLOs, and error budgets

Service levels

Error budgets

Collecting and saving monitoring data

Polling applications

Nagios

Prometheus

Cacti

Sensu

Push applications

StatsD

Telegraf

ELK

Displaying monitoring information

Arbitrary queries

Graphs

Dashboards

Chatbots

Managing and maintaining monitoring data

Communicating about monitoring

Do they even know there is monitoring?

References and related reading

Future reading

Summary

3. Incident Response

What is an incident?

What is incident response?

Alerting

When do you alert?

How do you alert?

Alerting services

What is in an alert?

Who do you alert?

Being on call

Communication

Incident Command System (ICS)

Where do you communicate?

Recovering the system

Calling all clear

Summary

4. Postmortems

What is a postmortem?

Why write a postmortem?

When to write a postmortem document

Carrying out incident analysis

How to write a postmortem document

Summary

Impact

Timeline

Root cause

Action items

Postmortems without action items

Appendix

Blameless postmortems

Holding a postmortem meeting

Analyzing past postmortems

MTTR and MTBF

Alert fatigue

Discussing past outages

Summary

References

5. Testing and Releasing

Testing

What do you test?

Testing code

Code reviews

Unit, feature, and integration tests

Unit tests

Feature tests

Integration tests

Testing infrastructure

Testing processes

Releasing

When to release

Releasing to production

Validating your release

Rollbacks

Automation

Continuous everything

Summary

6. Capacity Planning

A quick introduction to business finance

Why plan?

Managing risk and managing expectations

Defining a plan

What is our current capacity?

When are we going to run out of capacity?

How should we change our capacity?

State and concurrency

Is your service limited by another service?

Scaling for events

Unpredictable growth–user-generated content

Preplanned versus autoscaling

Delivering

Execute the plan

Architecture–where performance changes come from

Tech as a profit center and procurement

Summary

7. Building Tools

Finding projects

Defining projects

RDD

Example

Design documents

Planning projects

Example

Retrospectives and standups

Allocation

Building projects

Advice for writing code

Separation of concerns

Long-term work

Example OKRs

Notebooks

Documenting and maintaining projects

Summary

8. User Experience

An introduction to design and UX

Real-world interaction design

User testing

Picking an experience

Designing the test

Finding people to test

Developer experience

Experience of tools

Performance budgets

Security

Authentication

Authorization

Risk profile

Phishing

ACM code of ethics

Summary

References

9. Networking Foundations

The internet

Sending an HTTP request

DNS

dig

Ethernet and TCP/IP

Ethernet

IP

CIDR notation

ICMP

UDP

TCP

HTTP

curl and wget

Tools for watching the network

netstat

nc

tcpdump

Summary

References

10. Linux and Cloud Foundations

Linux fundamentals

Everything is a file

Files, directories, and inodes

Permissions

Sockets

Devices

/proc

Filesystem layout

What is a process?

Zombies

Orphans

What is nice?

syscalls

How to trace

Watching processes

Load averages

Build your own

Cloud fundamentals

VMs

Containers

Load balancing

Autoscaling

Storage

Queues and Pub/Sub

Units of scale

Example architecture interview

Summary

References

Other Books You May Enjoy

Leave a review - let other readers know what you think

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.