Graceful VM exits, some dials

Fly apps are typically fast to boot, and it’s relatively easy to boot new VMs. We start them up, do some health checks, and then add them to our load balancer and DNS service discovery. But what comes up must go down. We shut VMs down for any number

Michael Dwan
5 min readbeginner
--
View Original

Overview

The article discusses Fly.io's new feature that allows for graceful VM exits, enabling users to delay VM shutdowns for up to 24 hours. This is particularly beneficial for applications with long-lived connections, such as live streaming services and databases, to ensure that ongoing processes can complete without abrupt interruptions.

What You'll Learn

1

How to configure VM shutdown signals and timeouts in Fly.io

2

Why graceful shutdowns are crucial for applications with long-lived connections

3

When to use different shutdown signals like SIGINT and SIGTERM

Key Questions Answered

What is the maximum delay for VM shutdowns in Fly.io?
Fly.io allows users to delay VM shutdowns for up to 24 hours, which helps in completing ongoing tasks before termination. This feature is particularly useful for applications that maintain long-lived connections, such as live streaming services and databases.
How can you specify the shutdown signal for a VM in Fly.io?
Users can specify the shutdown signal in the fly.toml configuration file using the 'kill_signal' option. The default is SIGINT, but other signals like SIGTERM, SIGQUIT, and SIGUSR1 are also accepted, allowing for more control over how applications handle shutdowns.
What happens when a PostgreSQL server receives a SIGINT signal?
When a PostgreSQL server receives a SIGINT signal, it immediately aborts open transactions and closes all connections, leading to potential data loss. Instead, using a SIGTERM signal allows for a 'smart shutdown', giving the server time to commit transactions and close connections gracefully.

Key Statistics & Figures

Maximum VM shutdown delay
24 hours
This delay allows applications to finish ongoing processes before being terminated.
Default kill timeout for VMs
5 seconds
This is the standard timeout before a VM is forcefully terminated unless specified otherwise.
Maximum kill timeout for dedicated CPU VMs
24 hours
Dedicated CPU VMs can be configured to allow longer shutdown times compared to shared CPU VMs.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Key Actionable Insights

1
Implementing a longer kill timeout for VMs can significantly enhance user experience for applications with long-lived connections.
By allowing VMs to remain active for up to 24 hours after a shutdown signal, developers can ensure that users complete their tasks without abrupt disconnections, which is critical for services like live streaming and database transactions.
2
Using the correct shutdown signal can prevent data loss and application errors during VM termination.
For instance, using SIGTERM instead of SIGINT for PostgreSQL allows the server to gracefully commit transactions, reducing the risk of data inconsistency and improving overall application reliability.
3
Configuring your fly.toml file correctly is essential for managing VM behavior during shutdowns.
By adjusting parameters like 'kill_timeout' and 'kill_signal', developers can tailor the shutdown process to fit the specific needs of their applications, ensuring smoother transitions during updates or maintenance.

Common Pitfalls

1
Forgetting to configure the kill signal and timeout can lead to abrupt VM terminations.
If the default settings are not adjusted, applications may experience unexpected shutdowns, resulting in data loss or incomplete transactions.
2
Using the wrong shutdown signal can cause applications to terminate improperly.
For example, sending a SIGINT to a PostgreSQL server results in immediate transaction aborts, which can lead to data inconsistency. Using SIGTERM allows for a more graceful shutdown.

Related Concepts

Graceful Shutdowns
Vm Management
Load Balancing
Signal Handling