Skip to content

Task Operations

This page discusses operations relevant to Task management. Please go over the Task State Machine to understand the different states a task can be in and how operations applied (and external changes) move a task from one state to another.

Note

Please go through Cluster Op Spec to understand the operation parameters being sent.

For tasks only the timeout parameter is relevant.

Note

Only one operation can be active on a particular task identified by a {sourceAppName,taskId} at a time.

Warning

Only the leader controller will accept and process operations. To avoid confusion, use the controller endpoint exposed by Drove Gateway to issue commands.

Cluster Operation Specification

When an operation is submitted to the cluster, a cluster op spec needs to be specified. This is needed to control different aspects of the operation, including parallelism of an operation or increase the timeout for the operation and so on.

The following aspects of an operation can be configured:

Name Option Description
Timeout timeout The duration after which Drove considers the operation to have timed out.
Parallelism parallelism Parallelism of the task. (Range: 1-32)
Failure Strategy failureStrategy Set this to STOP.

Note

For internal recovery operations, Drove generates it's own operations. For that, Drove applies the following cluster operation spec:

  • timeout - 300 seconds
  • parallelism - 1
  • failureStrategy - STOP

The default operation spec can be configured in the controller configuration file. It is recommended to set this to a something like 8 for faster recovery.

How to initiate an operation

Tip

Use the Drove CLI to perform all manual operations.

All operations for task lifecycle management need to be issued via a POST HTTP call to the leader controller endpoint on the path /apis/v1/tasks/operations. API will return HTTP OK/200 and relevant json response as payload.

Sample api call:

curl --location 'http://drove.local:7000/apis/v1/tasks/operations' \
--header 'Content-Type: application/json' \
--header 'Authorization: Basic YWRtaW46YWRtaW4=' \
--data '{
    "type": "KILL",
    "sourceAppName" : "TEST_APP",
    "taskId" : "T0012",
    "opSpec": {
        "timeout": "5m",
        "parallelism": 1,
        "failureStrategy": "STOP"
    }
}'

Note

In the above examples, http://drove.local:7000 is the endpoint of the leader. TEST_APP is the name of the application that started this task and taskId is a unique client generated id. Authorization is basic auth.

Warning

Task operations are not cancellable.

Create a task

A task can be created issuing the following command.

Preconditions: - Task with same {sourceAppName,taskId} should not exist on the cluster.

State Transition:

  • none → PENDINGPROVISIONINGSTARTINGRUNNINGRUN_COMPLETEDDEPROVISIONINGSTOPPED

To create a task a Task Spec needs to be created first.

Once ready, CLI command needs to be issued or the following payload needs to be sent:

drove -c local tasks create sample/test_task.json

Sample Request Payload

{
    "type": "CREATE",
    "spec": {...}, //(1)!
    "opSpec": { //(2)!
        "timeout": "5m",
        "parallelism": 1,
        "failureStrategy": "STOP"
    }
}

  1. Spec as mentioned in Task Specification
  2. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "taskId": "TEST_APP-T0012"
    },
    "message": "success"
}

Warning

There are no separate create/run steps in a task. Creation will start execution automatically and immediately.

Kill a task

A task can be created issuing the following command.

Preconditions: - Task with same {sourceAppName,taskId} needs to exist on the cluster.

State Transition:

  • RUNNINGRUN_COMPLETEDDEPROVISIONINGSTOPPED

CLI command needs to be issued or the following payload needs to be sent:

drove -c local tasks kill TEST_APP T0012

Sample Request Payload

{
    "type": "KILL",
    "sourceAppName" : "TEST_APP",//(1)!
    "taskId" : "T0012",//(2)!
    "opSpec": {//(3)!
        "timeout": "5m",
        "parallelism": 1,
        "failureStrategy": "STOP"
    }
}

  1. Source app name as mentioned in spec during task creation
  2. Task ID as mentioned in the spec
  3. Operation spec as mentioned in Cluster Op Spec

Sample response

{
    "status": "SUCCESS",
    "data": {
        "taskId": "T0012"
    },
    "message": "success"
}

Note

Task metadata will remain on the cluster for some time. Metadata cleanup for tasks is automatic and can be configured in the controller configuration.