Application Operations¶
This page discusses operations relevant to Application management. Please go over the Application State Machine and Application Instance State Machine to understand the different states an application (and it's instances) can be in and how operations applied move an application from one state to another.
Note
Please go through Cluster Op Spec to understand the operation parameters being sent.
Note
Only one operation can be active on a particular {appName,version}
combination.
Warning
Only the leader controller will accept and process operations. To avoid confusion, use the controller endpoint exposed by Drove Gateway to issue commands.
How to initiate an operation¶
Tip
Use the Drove CLI to perform all manual operations.
All operations for application lifecycle management need to be issued via a POST HTTP call to the leader controller endpoint on the path /apis/v1/applications/operations
. API will return HTTP OK/200 and relevant json response as payload.
Sample api call:
curl --location 'http://drove.local:7000/apis/v1/applications/operations' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data '{
"type": "START_INSTANCES",
"appId": "TEST_APP-3",
"instances": 1,
"opSpec": {
"timeout": "5m",
"parallelism": 32,
"failureStrategy": "STOP"
}
}'
Note
In the above examples, http://drove.local:7000
is the endpoint of the leader. TEST_APP-3
is the Application ID. Authorization is basic auth.
Cluster Operation Specification¶
When an operation is submitted to the cluster, a cluster op spec needs to be specified. This is needed to control different aspects of the operation, including parallelism of an operation or increase the timeout for the operation and so on.
The following aspects of an operation can be configured:
Name | Option | Description |
---|---|---|
Timeout | timeout |
The duration after which Drove considers the operation to have timed out. |
Parallelism | parallelism |
Parallelism of the task. (Range: 1-32) |
Failure Strategy | failureStrategy |
Set this to STOP . |
Note
For internal recovery operations, Drove generates it's own operations. For that, Drove applies the following cluster operation spec:
- timeout - 300 seconds
- parallelism - 1
- failureStrategy -
STOP
The default operation spec can be configured in the controller configuration file. It is recommended to set this to a something like 8 for faster recovery.
How to cancel an operation¶
Operations can be requested to be cancelled asynchronously. A POST call needs to be made to leader controller endpoint on the api /apis/v1/operations/{applicationId}/cancel
(1) to achieve this.
applicationId
is the Application ID for the application
curl --location --request POST 'http://drove.local:7000/apis/v1/operations/TEST_APP-3/cancel' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data ''
Warning
Operation cancellation is not instantaneous. Cancellation will be affected only after current execution of the active operation is complete.
Create an application¶
Before deploying containers on the cluster, an application needs to be created.
Preconditions:
- App should not exist in the cluster
State Transition:
- none →
MONITORING
To create an application, an Application Spec needs to be created first.
Once ready, CLI command needs to be issued or the following payload needs to be sent:
drove -c local apps create sample/test_app.json
Sample Request Payload
{
"type": "CREATE",
"spec": {...}, //(1)!
"opSpec": { //(2)!
"timeout": "5m",
"parallelism": 1,
"failureStrategy": "STOP"
}
}
- Spec as mentioned in Application Specification
- Operation spec as mentioned in Cluster Op Spec
Sample response
{
"data" : {
"appId" : "TEST_APP-1"
},
"message" : "success",
"status" : "SUCCESS"
}
Starting new instances of an application¶
New instances can be started by issuing the START_INSTANCES
command.
Preconditions
- Application must be in one of the following states: MONITORING
, RUNNING
State Transition:
- {
RUNNING
,MONITORING
} →RUNNING
The following command/payload will start 2
new instances of the application.
drove -c local apps deploy TEST_APP-1 2
Sample Request Payload
{
"type": "START_INSTANCES",
"appId": "TEST_APP-1",//(1)!
"instances": 2,//(2)!
"opSpec": {//(3)!
"timeout": "5m",
"parallelism": 32,
"failureStrategy": "STOP"
}
}
- Application ID
- Number of instances to be started
- Operation spec as mentioned in Cluster Op Spec
Sample response
{
"status": "SUCCESS",
"data": {
"appId": "TEST_APP-1"
},
"message": "success"
}
Suspending an application¶
All instances of an application can be shut down by issuing the SUSPEND
command.
Preconditions
- Application must be in one of the following states: MONITORING
, RUNNING
State Transition:
- {
RUNNING
,MONITORING
} →MONITORING
The following command/payload will suspend all instances of the application.
drove -c local apps suspend TEST_APP-1
Sample Request Payload
{
"type": "SUSPEND",
"appId": "TEST_APP-1",//(1)!
"opSpec": {//(2)!
"timeout": "5m",
"parallelism": 32,
"failureStrategy": "STOP"
}
}
- Application ID
- Operation spec as mentioned in Cluster Op Spec
Sample response
{
"status": "SUCCESS",
"data": {
"appId": "TEST_APP-1"
},
"message": "success"
}
Scaling the application up or down¶
Scaling the application to required number of containers can be achieved using the SCALE
command. Application can be either scaled up or down using this command.
Preconditions
- Application must be in one of the following states: MONITORING
, RUNNING
State Transition:
- {
RUNNING
,MONITORING
} →MONITORING
ifrequiredInstances
is set to 0 - {
RUNNING
,MONITORING
} →RUNNING
ifrequiredInstances
is non 0
drove -c local apps scale TEST_APP-1 2
Sample Request Payload
{
"type": "SCALE",
"appId": "TEST_APP-1", //(3)!
"requiredInstances": 2, //(1)!
"opSpec": { //(2)!
"timeout": "1m",
"parallelism": 20,
"failureStrategy": "STOP"
}
}
- Absolute number of instances to be maintained on the cluster for the application
- Operation spec as mentioned in Cluster Op Spec
- Application ID
Sample response
{
"status": "SUCCESS",
"data": {
"appId": "TEST_APP-1"
},
"message": "success"
}
Note
During scale down, older instances are stopped first
Tip
If implementing automation on top of Drove APIs, just use the SCALE
command to scale up or down instead of using START_INSTANCES
or SUSPEND
separately.
Restarting an application¶
Application can be restarted by issuing the REPLACE_INSTANCES
operation. In this case, first clusterOpSpec.parallelism
number of containers are spun up first and then an equivalent number of them are spun down. This ensures that cluster maintains enough capacity is maintained in the cluster to handle incoming traffic as the restart is underway.
Warning
If the cluster does not have sufficient capacity to spin up new containers, this operation will get stuck. So adjust your parallelism accordingly.
Preconditions
- Application must be in RUNNING
state.
State Transition:
RUNNING
→REPLACE_INSTANCES_REQUESTED
→RUNNING
drove -c local apps restart TEST_APP-1
Sample Request Payload
{
"type": "REPLACE_INSTANCES",
"appId": "TEST_APP-1", //(1)!
"instanceIds": [], //(2)!
"opSpec": { //(3)!
"timeout": "1m",
"parallelism": 20,
"failureStrategy": "STOP"
}
}
- Application ID
- Instances that need to be restarted. This is optional. If nothing is passed, all instances will be replaced.
- Operation spec as mentioned in Cluster Op Spec
Sample response
{
"status": "SUCCESS",
"data": {
"appId": "TEST_APP-1"
},
"message": "success"
}
Tip
To replace specific instances, pass their application instance ids (starts with AI-...
) in the instanceIds
parameter in the JSON payload.
Stop or replace specific instances of an application¶
Application instances can be killed by issuing the STOP_INSTANCES
operation. Default behaviour of Drove is to replace killed instances by new instances. Such new instances are always spun up before the specified(old) instances are stopped. If skipRespawn
parameter is set to true, the application instance is killed but no new instances are spun up to replace it.
Warning
If the cluster does not have sufficient capacity to spin up new containers, and skipRespawn
is not set or set to false
, this operation will get stuck.
Preconditions
- Application must be in RUNNING
state.
State Transition:
RUNNING
→STOP_INSTANCES_REQUESTED
→RUNNING
if final number of instances is non zeroRUNNING
→STOP_INSTANCES_REQUESTED
→MONITORING
if final number of instances is zero
drove -c local apps appinstances kill TEST_APP-1 AI-601d160e-c692-4ddd-8b7f-4c09b30ed02e
Sample Request Payload
{
"type": "STOP_INSTANCES",
"appId" : "TEST_APP-1",//(1)!
"instanceIds" : [ "AI-601d160e-c692-4ddd-8b7f-4c09b30ed02e" ],//(2)!
"skipRespawn" : true,//(3)!
"opSpec": {//(4)!
"timeout": "5m",
"parallelism": 1,
"failureStrategy": "STOP"
}
}
- Application ID
- Instance ids to be stopped
- Do not spin up new containers to replace the stopped ones. This is set ot
false
by default. - Operation spec as mentioned in Cluster Op Spec
Sample response
{
"status": "SUCCESS",
"data": {
"appId": "TEST_APP-1"
},
"message": "success"
}
Destroy an application¶
To remove an application deployment (appName
-version
combo) the DESTROY
command can be issued.
Preconditions:
- App should not exist in the cluster
State Transition:
MONITORING
→DESTROY_REQUESTED
→DESTROYED
→ none
To create an application, an Application Spec needs to be created first.
Once ready, CLI command needs to be issued or the following payload needs to be sent:
drove -c local apps destroy TEST_APP_1
Sample Request Payload
{
"type": "DESTROY",
"appId" : "TEST_APP-1",//(1)!
"opSpec": {//(2)!
"timeout": "5m",
"parallelism": 2,
"failureStrategy": "STOP"
}
}
- Spec as mentioned in Application Specification
- Operation spec as mentioned in Cluster Op Spec
Sample response
{
"status": "SUCCESS",
"data": {
"appId": "TEST_APP-1"
},
"message": "success"
}
Warning
All metadata for an app and it's instances are completely obliterated from Drove's storage once an app is destroyed