Home Page Icon
Home Page
Table of Contents for
Table of Contents
Close
Table of Contents
by Nat Welch
Real-World SRE
Real-World SRE
Table of Contents
Real-World SRE
Why subscribe?
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
1. Introduction
A brief history
What is SRE?
What is in the book?
SRE as a framework for new projects
Summary
References
2. Monitoring
Why monitoring?
Instrumenting an application
What should we measure?
A short introduction to SLIs, SLOs, and error budgets
Service levels
Error budgets
Collecting and saving monitoring data
Polling applications
Nagios
Prometheus
Cacti
Sensu
Push applications
StatsD
Telegraf
ELK
Displaying monitoring information
Arbitrary queries
Graphs
Dashboards
Chatbots
Managing and maintaining monitoring data
Communicating about monitoring
Do they even know there is monitoring?
References and related reading
Future reading
Summary
3. Incident Response
What is an incident?
What is incident response?
Alerting
When do you alert?
How do you alert?
Alerting services
What is in an alert?
Who do you alert?
Being on call
Communication
Incident Command System (ICS)
Where do you communicate?
Recovering the system
Calling all clear
Summary
4. Postmortems
What is a postmortem?
Why write a postmortem?
When to write a postmortem document
Carrying out incident analysis
How to write a postmortem document
Summary
Impact
Timeline
Root cause
Action items
Postmortems without action items
Appendix
Blameless postmortems
Holding a postmortem meeting
Analyzing past postmortems
MTTR and MTBF
Alert fatigue
Discussing past outages
Summary
References
5. Testing and Releasing
Testing
What do you test?
Testing code
Code reviews
Unit, feature, and integration tests
Unit tests
Feature tests
Integration tests
Testing infrastructure
Testing processes
Releasing
When to release
Releasing to production
Validating your release
Rollbacks
Automation
Continuous everything
Summary
6. Capacity Planning
A quick introduction to business finance
Why plan?
Managing risk and managing expectations
Defining a plan
What is our current capacity?
When are we going to run out of capacity?
How should we change our capacity?
State and concurrency
Is your service limited by another service?
Scaling for events
Unpredictable growth–user-generated content
Preplanned versus autoscaling
Delivering
Execute the plan
Architecture–where performance changes come from
Tech as a profit center and procurement
Summary
7. Building Tools
Finding projects
Defining projects
RDD
Example
Design documents
Planning projects
Example
Retrospectives and standups
Allocation
Building projects
Advice for writing code
Separation of concerns
Long-term work
Example OKRs
Notebooks
Documenting and maintaining projects
Summary
8. User Experience
An introduction to design and UX
Real-world interaction design
User testing
Picking an experience
Designing the test
Finding people to test
Developer experience
Experience of tools
Performance budgets
Security
Authentication
Authorization
Risk profile
Phishing
ACM code of ethics
Summary
References
9. Networking Foundations
The internet
Sending an HTTP request
DNS
dig
Ethernet and TCP/IP
Ethernet
IP
CIDR notation
ICMP
UDP
TCP
HTTP
curl and wget
Tools for watching the network
netstat
nc
tcpdump
Summary
References
10. Linux and Cloud Foundations
Linux fundamentals
Everything is a file
Files, directories, and inodes
Permissions
Sockets
Devices
/proc
Filesystem layout
What is a process?
Zombies
Orphans
What is nice?
syscalls
How to trace
Watching processes
Load averages
Build your own
Cloud fundamentals
VMs
Containers
Load balancing
Autoscaling
Storage
Queues and Pub/Sub
Units of scale
Example architecture interview
Summary
References
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Cover
Next
Next Chapter
Real-World SRE
Table of Contents
Real-World SRE
Why subscribe?
PacktPub.com
Contributors
About the author
About the reviewer
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
1. Introduction
A brief history
What is SRE?
What is in the book?
SRE as a framework for new projects
Summary
References
2. Monitoring
Why monitoring?
Instrumenting an application
What should we measure?
A short introduction to SLIs, SLOs, and error budgets
Service levels
Error budgets
Collecting and saving monitoring data
Polling applications
Nagios
Prometheus
Cacti
Sensu
Push applications
StatsD
Telegraf
ELK
Displaying monitoring information
Arbitrary queries
Graphs
Dashboards
Chatbots
Managing and maintaining monitoring data
Communicating about monitoring
Do they even know there is monitoring?
References and related reading
Future reading
Summary
3. Incident Response
What is an incident?
What is incident response?
Alerting
When do you alert?
How do you alert?
Alerting services
What is in an alert?
Who do you alert?
Being on call
Communication
Incident Command System (ICS)
Where do you communicate?
Recovering the system
Calling all clear
Summary
4. Postmortems
What is a postmortem?
Why write a postmortem?
When to write a postmortem document
Carrying out incident analysis
How to write a postmortem document
Summary
Impact
Timeline
Root cause
Action items
Postmortems without action items
Appendix
Blameless postmortems
Holding a postmortem meeting
Analyzing past postmortems
MTTR and MTBF
Alert fatigue
Discussing past outages
Summary
References
5. Testing and Releasing
Testing
What do you test?
Testing code
Code reviews
Unit, feature, and integration tests
Unit tests
Feature tests
Integration tests
Testing infrastructure
Testing processes
Releasing
When to release
Releasing to production
Validating your release
Rollbacks
Automation
Continuous everything
Summary
6. Capacity Planning
A quick introduction to business finance
Why plan?
Managing risk and managing expectations
Defining a plan
What is our current capacity?
When are we going to run out of capacity?
How should we change our capacity?
State and concurrency
Is your service limited by another service?
Scaling for events
Unpredictable growth–user-generated content
Preplanned versus autoscaling
Delivering
Execute the plan
Architecture–where performance changes come from
Tech as a profit center and procurement
Summary
7. Building Tools
Finding projects
Defining projects
RDD
Example
Design documents
Planning projects
Example
Retrospectives and standups
Allocation
Building projects
Advice for writing code
Separation of concerns
Long-term work
Example OKRs
Notebooks
Documenting and maintaining projects
Summary
8. User Experience
An introduction to design and UX
Real-world interaction design
User testing
Picking an experience
Designing the test
Finding people to test
Developer experience
Experience of tools
Performance budgets
Security
Authentication
Authorization
Risk profile
Phishing
ACM code of ethics
Summary
References
9. Networking Foundations
The internet
Sending an HTTP request
DNS
dig
Ethernet and TCP/IP
Ethernet
IP
CIDR notation
ICMP
UDP
TCP
HTTP
curl and wget
Tools for watching the network
netstat
nc
tcpdump
Summary
References
10. Linux and Cloud Foundations
Linux fundamentals
Everything is a file
Files, directories, and inodes
Permissions
Sockets
Devices
/proc
Filesystem layout
What is a process?
Zombies
Orphans
What is nice?
syscalls
How to trace
Watching processes
Load averages
Build your own
Cloud fundamentals
VMs
Containers
Load balancing
Autoscaling
Storage
Queues and Pub/Sub
Units of scale
Example architecture interview
Summary
References
Other Books You May Enjoy
Leave a review - let other readers know what you think
Index
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset