A postmortem meeting is the follow-up to the document. It often finalizes action items, discusses the findings of the root cause, and offers a safe setting for discussion. Usually, the best set of people to invite to a postmortem meeting are those involved in the incident—the tech lead for services affected, the product manager for the affected services, and any interested engineers. Finding the right balance is hard, because if the meeting is too large, you will not get much done, but if the meeting is too small, knowledge won't be disseminated well.
It's a good idea to have those involved in the incident present because they will know what happened in case data is missing from the document. Tech leads from affected services should be there in case assumptions are made about their services that aren't true and to accept responsibility to make sure that the action items get implemented. Product managers for affected services are important because they can help to describe business needs related to the service and work with the tech leads to make sure that the fixes get prioritized. Allowing interested people to attend is important to promote knowledge transfer.
The incident commander from the incident should lead the meeting. While the incident commander doesn't always write the entire postmortem, they are often the most knowledgeable about the event, and those involved, and can direct conversation. Open the meeting with a statement requesting that those who have not read the document excuse themselves from the meeting or agree not to participate. The thinking here is that those who have not read the document will ask questions or make statements that might be already answered in the document. The meeting should not cater to those who couldn't take the time to be informed about the incident for the sake of wider learning. Following that, try sticking to the following agenda:
This schedule works great for larger organizations. I have worked on teams where we ran a postmortem meeting once a week and would go through every outage from the past week in this format. In smaller organizations, it is often useful to be less formal and go faster. In these cases, I like a quick discussion about a postmortem or incident:
For this more informal rotation, you should try and only spend a minute or two on each person for each question. That way, the whole process takes less than half an hour for a team of five.