The following considerations should be taken into account while implementing this pattern:
- System downtime: Auto-scaling should be achieved without system downtime. This forces the requirement that existing instances hosting the service should not be impacted by the scaling operation.
- Scale up or scale out: Depending on the execution model of the application being hosted, the decision whether to scale up/down or scale out/in has to be made.
- Start-up / shut-down threshold: Starting up a new node or shutting down a node will take some time. This time needs to be accounted for before a decision on scaling is made. A good input for this decision is consistency of the load which can be evaluated against a threshold before scaling the system.
- Max and min count for instances: It is important to set bounds for auto-scaling to ensure that any erroneous/malicious activities are not impacting the availability or the operation expense of the system. Alerts should be configured to monitor any unusual scaling activities.
- Logging and monitoring: The scaling activity should be logged and the performance of the system should be monitored post scaling to ensure that the scaling operation was effective.