Skip to content

Application Operations

This page discusses operations relevant to Application management. Please go over the Application State Machine and Application Instance State Machine to understand the different states an application (and it's instances) can be in and how operations applied move an application from one state to another.

Note

Please go through Cluster Op Spec to understand the operation parameters being sent.

Note

Only one operation can be active on a particular {appName,version} combination.

Warning

Only the leader controller will accept and process operations. To avoid confusion, use the controller endpoint exposed by Drove Gateway to issue commands.

How to initiate an operation

Tip

Use the Drove CLI to perform all manual operations.

All operations for application lifecycle management need to be issued via a POST HTTP call to the leader controller endpoint on the path /apis/v1/applications/operations. API will return HTTP OK/200 and relevant json response as payload.

Sample api call:

curl --location 'http://drove.local:7000/apis/v1/applications/operations' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data '{
    "type": "START_INSTANCES",
    "appId": "TEST_APP-3",
    "instances": 1,
    "opSpec": {
        "timeout": "5m",
        "parallelism": 32,
        "failureStrategy": "STOP"
    }
}'

Note

In the above examples, http://drove.local:7000 is the endpoint of the leader. TEST_APP-3 is the Application ID. Authorization is basic auth.

Cluster Operation Specification

When an operation is submitted to the cluster, a cluster op spec needs to be specified. This is needed to control different aspects of the operation, including parallelism of an operation or increase the timeout for the operation and so on.

The following aspects of an operation can be configured:

Name Option Description
Timeout timeout The duration after which Drove considers the operation to have timed out.
Parallelism parallelism Parallelism of the task. (Range: 1-32)
Failure Strategy failureStrategy Set this to STOP.

Note

For internal recovery operations, Drove generates it's own operations. For that, Drove applies the following cluster operation spec:

  • timeout - 300 seconds
  • parallelism - 1
  • failureStrategy - STOP

The default operation spec can be configured in the controller configuration file. It is recommended to set this to a something like 8 for faster recovery.

How to cancel an operation

Operations can be requested to be cancelled asynchronously. A POST call needs to be made to leader controller endpoint on the api /apis/v1/operations/{applicationId}/cancel (1) to achieve this.

  1. applicationId is the Application ID for the application
curl --location --request POST 'http://drove.local:7000/apis/v1/operations/TEST_APP-3/cancel' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data ''

Warning

Operation cancellation is not instantaneous. Cancellation will be affected only after current execution of the active operation is complete.

Create an application

Before deploying containers on the cluster, an application needs to be created.

Preconditions:

  • App should not exist in the cluster

State Transition:

  • none → MONITORING

To create an application, an Application Spec needs to be created first.

Once ready, CLI command needs to be issued or the following payload needs to be sent:

drove -c local apps create sample/test_app.json

Sample Request Payload

{
    "type": "CREATE",
    "spec": {...}, //(1)!
    "opSpec": { //(2)!
        "timeout": "5m",
        "parallelism": 1,
        "failureStrategy": "STOP"
    }
}

  1. Spec as mentioned in Application Specification
  2. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "data" : {
        "appId" : "TEST_APP-1"
    },
    "message" : "success",
    "status" : "SUCCESS"
}

Starting new instances of an application

New instances can be started by issuing the START_INSTANCES command.

Preconditions - Application must be in one of the following states: MONITORING, RUNNING

State Transition:

  • {RUNNING, MONITORING} → RUNNING

The following command/payload will start 2 new instances of the application.

drove -c local apps deploy TEST_APP-1 2

Sample Request Payload

{
    "type": "START_INSTANCES",
    "appId": "TEST_APP-1",//(1)!
    "instances": 2,//(2)!
    "opSpec": {//(3)!
        "timeout": "5m",
        "parallelism": 32,
        "failureStrategy": "STOP"
    }
}

  1. Application ID
  2. Number of instances to be started
  3. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "appId": "TEST_APP-1"
    },
    "message": "success"
}

Suspending an application

All instances of an application can be shut down by issuing the SUSPEND command.

Preconditions - Application must be in one of the following states: MONITORING, RUNNING

State Transition:

  • {RUNNING, MONITORING} → MONITORING

The following command/payload will suspend all instances of the application.

drove -c local apps suspend TEST_APP-1

Sample Request Payload

{
    "type": "SUSPEND",
    "appId": "TEST_APP-1",//(1)!
    "opSpec": {//(2)!
        "timeout": "5m",
        "parallelism": 32,
        "failureStrategy": "STOP"
    }
}

  1. Application ID
  2. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "appId": "TEST_APP-1"
    },
    "message": "success"
}

Scaling the application up or down

Scaling the application to required number of containers can be achieved using the SCALE command. Application can be either scaled up or down using this command.

Preconditions - Application must be in one of the following states: MONITORING, RUNNING

State Transition:

  • {RUNNING, MONITORING} → MONITORING if requiredInstances is set to 0
  • {RUNNING, MONITORING} → RUNNING if requiredInstances is non 0
drove -c local apps scale TEST_APP-1 2

Sample Request Payload

{
    "type": "SCALE",
    "appId": "TEST_APP-1", //(3)!
    "requiredInstances": 2, //(1)!
    "opSpec": { //(2)!
        "timeout": "1m",
        "parallelism": 20,
        "failureStrategy": "STOP"
    }
}

  1. Absolute number of instances to be maintained on the cluster for the application
  2. Operation spec as mentioned in Cluster Op Spec
  3. Application ID

Sample response

{
    "status": "SUCCESS",
    "data": {
        "appId": "TEST_APP-1"
    },
    "message": "success"
}

Note

During scale down, older instances are stopped first

Tip

If implementing automation on top of Drove APIs, just use the SCALE command to scale up or down instead of using START_INSTANCES or SUSPEND separately.

Restarting an application

Application can be restarted by issuing the REPLACE_INSTANCES operation. In this case, first clusterOpSpec.parallelism number of containers are spun up first and then an equivalent number of them are spun down. This ensures that cluster maintains enough capacity is maintained in the cluster to handle incoming traffic as the restart is underway.

Warning

If the cluster does not have sufficient capacity to spin up new containers, this operation will get stuck. So adjust your parallelism accordingly.

Preconditions - Application must be in RUNNING state.

State Transition:

  • RUNNINGREPLACE_INSTANCES_REQUESTEDRUNNING
drove -c local apps restart TEST_APP-1

Sample Request Payload

{
    "type": "REPLACE_INSTANCES",
    "appId": "TEST_APP-1", //(1)!
    "instanceIds": [], //(2)!
    "opSpec": { //(3)!
        "timeout": "1m",
        "parallelism": 20,
        "failureStrategy": "STOP"
    }
}

  1. Application ID
  2. Instances that need to be restarted. This is optional. If nothing is passed, all instances will be replaced.
  3. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "appId": "TEST_APP-1"
    },
    "message": "success"
}

Tip

To replace specific instances, pass their application instance ids (starts with AI-...) in the instanceIds parameter in the JSON payload.

Stop or replace specific instances of an application

Application instances can be killed by issuing the STOP_INSTANCES operation. Default behaviour of Drove is to replace killed instances by new instances. Such new instances are always spun up before the specified(old) instances are stopped. If skipRespawn parameter is set to true, the application instance is killed but no new instances are spun up to replace it.

Warning

If the cluster does not have sufficient capacity to spin up new containers, and skipRespawn is not set or set to false, this operation will get stuck.

Preconditions - Application must be in RUNNING state.

State Transition:

  • RUNNINGSTOP_INSTANCES_REQUESTEDRUNNING if final number of instances is non zero
  • RUNNINGSTOP_INSTANCES_REQUESTEDMONITORING if final number of instances is zero
drove -c local apps appinstances kill TEST_APP-1 AI-601d160e-c692-4ddd-8b7f-4c09b30ed02e

Sample Request Payload

{
    "type": "STOP_INSTANCES",
    "appId" : "TEST_APP-1",//(1)!
    "instanceIds" : [ "AI-601d160e-c692-4ddd-8b7f-4c09b30ed02e" ],//(2)!
    "skipRespawn" : true,//(3)!
    "opSpec": {//(4)!
        "timeout": "5m",
        "parallelism": 1,
        "failureStrategy": "STOP"
    }
}

  1. Application ID
  2. Instance ids to be stopped
  3. Do not spin up new containers to replace the stopped ones. This is set ot false by default.
  4. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "appId": "TEST_APP-1"
    },
    "message": "success"
}

Destroy an application

To remove an application deployment (appName-version combo) the DESTROY command can be issued.

Preconditions:

  • App should not exist in the cluster

State Transition:

  • MONITORINGDESTROY_REQUESTEDDESTROYED → none

To create an application, an Application Spec needs to be created first.

Once ready, CLI command needs to be issued or the following payload needs to be sent:

drove -c local apps destroy TEST_APP_1

Sample Request Payload

{
    "type": "DESTROY",
    "appId" : "TEST_APP-1",//(1)!
    "opSpec": {//(2)!
        "timeout": "5m",
        "parallelism": 2,
        "failureStrategy": "STOP"
    }
}

  1. Spec as mentioned in Application Specification
  2. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "appId": "TEST_APP-1"
    },
    "message": "success"
}

Warning

All metadata for an app and it's instances are completely obliterated from Drove's storage once an app is destroyed