Domino maintains a pool of machines called executors, organized into hardware tiers for use in Domino Executions. Domino system administrators can monitor and take actions on executors from the Admin application, and this topic describes how to monitor and work with the fleet of executors in your Domino deployment.
Domino system administrators can click Dispatcher from the Admin home to view and manage executors in the deployment. This interface features live updating, and shows all currently configured hardware tiers and executors, with information about usage and current state for each.
Executor state
In the Dispatcher interface, you can see information about your executors. You will see flags for the following states on executors.
-
Available
Available executors are ready for use, and may be assigned Runs, unless they have been manually put in Maintenance Mode.
-
Unusable
Domino will not assign Runs to executors in an Unusable state, and executors in an Unusable state do count against the total number of allowed executors in their hardware tiers. This means that a large number of executors in an Unusable state can fill the capacity of a hardware tier and limit its availability. If this occurs, system administrators should either transfer take action to manually terminate them, wait for the executors to be automatically terminated, or put the executors into Maintenance Mode to attempt to repair them.
-
Maintenance Mode
You will see a flag indicating executors that have been put into Maintenance Mode, plus an optional comment from the admin who toggled Maintenance Mode on for that executor.
Take actions on executors
The rightmost column for each executor in the Dispatcher interface is an Actions link that opens an interface where Domino system administrators can take the following actions.
-
Toggle Maintenance Mode on the executor
-
Start or stop the executor
-
Terminate the executor
-
Restart the executor
Domino executors are subject to several periodic health checks. There are two checks that test if the Dispatcher is able to connect to vital services running on the executor.
However, there is also a configurable health check for disk space.
If com.cerebro.domino.executor.minUsableSpaceInGB
is set to a non-zero value, disk space health checks will run and executors will fail the health check if their available disk space is lower than the minimum of the following two configuration options.
-
Namespace:
common
-
Key:
com.cerebro.domino.executor.diskSpaceRunsGarbageCollectorFreeSpaceLimit
-
Value: number of bytes
-
Default: 50000000000 (this is ~50GB in bytes)
-
Namespace:
common
-
Key:
com.cerebro.domino.executor.minUsableSpaceInGB
-
Value: number of gigabytes
-
Default: 0
If this option is set to its default value of 0, disk space health checks will be disabled and will not run or impact your executors.
These options can be configured in the Central Config.
Health check failures
If an executor fails a health check, the following process occurs.
-
On the next Dispatcher tick, the Dispatcher will stops scheduling Runs on the executor.
-
The executor goes idle once any existing Runs finish.
-
After 5 minutes (configurable) of idle, the executor enters an Unusable state visible in its Executor State column on the Dispatcher interface.
-
After 15 minutes (configurable) in Unusable state, the executor is stopped.
-
48 hours (configurable) after stopping, executors in an Unusable state are terminated.
Unresponsive executors
When previously functional executors become completely unresponsive, Domino executes the following process.
-
After 15-minutes (configurable) of being unresponsive, the executor is placed in an Unusable state and stopped immediately.
-
48 hours (configurable) after stopping the instance is terminated.
Executors dead on arrival
When Domino attempts to start a new executor that never becomes responsive, the following process occurs.
-
After 15-minutes (configurable) of being unresponsive, the executor is placed in an Unusable state and stopped immediately.
-
48 hours (configurable) after stopping the instance is terminated.
Maintenance Mode
From the Actions interface for an individual executor, Domino system administrators can enable Maintenance Mode on the executor. This does the following things.
-
Executors in Maintenance Mode will not be assigned new Runs by the Dispatcher.
-
Executors in Maintenance Mode will not be automatically terminated by Unusable state timeouts.
-
Executors in Maintenance Mode do not count against the total executor limits for their hardware tiers.
-
An executor that is responsive and has been in Maintenance Mode for 120 minutes (configurable) will be stopped.
-
An executor that is unresponsive and has been in Maintenance Mode for 15 minutes (configurable) will be stopped.
-
An executor that is passing its health checks while in Maintenance Mode will attempt to rejoin the pool of Available executors in its hardware tier when Maintenance Mode is toggled off.
Domino system administrators should consider putting executors in an Unusable state into Maintenance Mode if they believe the executor can be fixed and restored to healthy operation, or if they want to attempt to recover data from the executor and thus want to exempt it from automatic termination.
Configurable timeout settings
-
Namespace:
common
-
Key:
com.cerebro.domino.executor.maxIdleMaintenanceModeTimeInMinutes
-
Value: number of minutes
-
Default: 120
This is the time a machine can be responsive, running, and idle in Maintenance Mode before it will be automatically stopped.
-
Namespace:
common
-
Key:
com.cerebro.domino.dispatcher.clusterHealthMonitoring.unhealthyExecutorMMTimeout
-
Value: JODA duration
-
Default: 15m
This is the duration before an unresponsive executor in Maintenance Mode will be stopped, and the duration before an unresponsive executor will be set to an Unusable state.
-
Namespace:
common
-
Key:
com.cerebro.domino.executor.healthCheckTimeoutInMinutes
-
Value: number of minutes
-
Default: 5
This is the duration before an Available executor that is failing health checks will be put into an Unusable state.
-
Namespace:
common
-
Key:
com.cerebro.domino.dispatcher.clusterHealthMonitoring.unusable2StoppedExecutorTimeoutMin
-
Value: number of minutes
-
Default: 5
This is the duration a machine in an Unusable state can remain idle before being stopped.
-
Namespace:
common
-
Key:
com.cerebro.domino.dispatcher.clusterHealthMonitoring.unusable2TerminatedExecutorTimeoutMin
-
Value: number of minutes
-
Default: 2880
This is the duration a machine in an Unusable state can remain stopped before being terminated.
These options can be set in the Central Config.