Jobs API Reference¶

Jobs¶

class apolo_sdk.Jobs¶

Jobs subsystem, available as Client.jobs.

User can start new job, terminate it, get status, list running jobs etc.

async attach(id: str, *, tty: bool = False, stdin: bool = False, stdout: bool = False, stderr: bool = False, cluster_name: str | None = None) → AsyncContextManager[StdStream][source]¶

Get access to standard input, output, and error streams of a running job.

Parameters:

id (str) – job id to use for command execution.
tty (bool) – True if tty mode is requested, default is False.
stdin (bool) – True to attach stdin, default is False.
stdout (bool) – True to attach stdout, default is False.
stderr (bool) – True to attach stderr, default is False.
cluster_name (str) –
cluster on which the job is running.

None means the current cluster (default).

Returns:

Asynchronous context manager which can be used to access stdin/stdout/stderr, see StdStream for details.

async exec(id: str, cmd: str, *, tty: bool = False, stdin: bool = False, stdout: bool = False, stderr: bool = False, cluster_name: str | None = None) → AsyncContextManager[StdStream][source]¶

Start an exec session, get access to session’s standard input, output, and error streams.

Parameters:

id (str) – job id to use for command execution.
cmd (str) – the command to execute.
tty (bool) – True if tty mode is requested, default is False.
stdin (bool) – True to attach stdin, default is False.
stdout (bool) – True to attach stdout, default is False.
stderr (bool) – True to attach stderr, default is False.
cluster_name (str) –
cluster on which the job is running.

None means the current cluster (default).

Returns:

Asynchronous context manager which can be used to access stdin/stdout/stderr, see StdStream for details.

async get_capacity(*, cluster_name: str | None = None) → Mapping[str, int][source]¶

Get counts of available job for specified cluster for each available preset.

The returned numbers reflect the remaining cluster capacity. In other words, it displays how many concurrent jobs for each preset can be started at the moment of the method call.

The returned capacity is an approximation, the real value can differ if already running jobs are finished or another user starts own jobs at the same time.

Parameters:

cluster_name (str) –

cluster for which the request is performed.

None means the current cluster (default).

Returns:

A mapping of preset_name to count, where count is a number of concurrent jobs that can be executed using preset_name.

async kill(id: str) → None[source]¶

Kill a job.

Parameters:: id (str) – job id to kill.

async list(*, statuses: Iterable[JobStatus] = (), name: str | None = None, tags: Sequence[str] = (), owners: Iterable[str] = (), since: datetime | None = None, until: datetime | None = None, reverse: bool = False, limit: int | None = None, cluster_name: str | None = None) → AsyncContextManager[AsyncIterator[JobDescription]][source]¶

List user jobs, all scheduled, running and finished jobs by default.

Parameters:

statuses (Iterable[JobStatus]) –
filter jobs by their statuses.

The parameter can be a set or list of requested statuses, e.g. {JobStatus.PENDIND, JobStatus.RUNNING} can be used for requesting only scheduled and running job but skip finished and failed ones.

Empty sequence means that jobs with all statuses are returned (default behavior). The list can be pretty huge though.
name (str) –
Filter jobs by name (exact match).

Empty string or None means that no filter is applied (default).
tags (Sequence[str]) –
filter jobs by tags.

Retrieves only jobs submitted with all tags from the specified list.

Empty list means that no filter is applied (default).
owners (Iterable[str]) –
filter jobs by their owners.

The parameter can be a set or list of owner usernames (see JobDescription.owner for details).

No owners filter is applied if the iterable is empty.
since (datetime) –
filter jobs by their creation date.

Retrieves only jobs submitted after the specified date (including) if it is not None. If the parameter is a naive datetime object, it represents local time.

None means that no filter is applied (default).
until (datetime) –
filter jobs by their creation date.

Retrieves only jobs submitted before the specified date (including) if it is not None. If the parameter is a naive datetime object, it represents local time.

None means that no filter is applied (default).
reverse (bool) –
iterate jobs in the reverse order.

If reverse is false (default) the jobs are iterated in the order of their creation date, from earlier to later. If reverse is true, they are iterated in the reverse order, from later to earlier.
limit (int) –
limit the number of jobs.

None means no limit (default).
cluster_name (str) –
list jobs on specified cluster.

None means the current cluster (default).

Returns:

asynchronous iterator which emits JobDescription objects.

monitor(id: str, *, cluster_name: Optional[str] = None, since: Optional[datetime] = None,

timestamps: bool = False,

separator: Optional[str] = None,

) -> AsyncContextManager[AsyncIterator[bytes]]

Get job logs as a sequence of data chunks, e.g.:

async with client.jobs.monitor(job_id) as it:
    async for chunk in it:
        print(chunk.encode('utf8', errors='replace')

Parameters:

id (str) – job id to retrieve logs.
cluster_name (str) –
cluster on which the job is running.

None means the current cluster (default).
since (datetime) –
Retrieves only logs after the specified date (including) if it is not None. If the parameter is a naive datetime object, it represents local time.

None means that no filter is applied (default).
timestamps (bool) – if true, include timestamps on each line in the log output.
separator (str) –
string which will separate archive and live logs (if both parts are present).

By default a string containing random characters are used. Empty separator suppresses output of separator.

Returns:

AsyncIterator over bytes log chunks.

async port_forward(id: str, local_port: int, job_port: int, *, no_key_check: bool = False, cluster_name: str | None = None) → None[source]¶

Forward local port to job, e.g.:

async with client.jobs.port_forward(job_id, 8080, 80):
    # port forwarding is awailable inside with-block

Parameters:

id (str) – job id.
local_port (int) – local TCP port to forward.
jot_port (int) – remote TCP port in a job to forward.
cluster_name (str) –
cluster on which the job is running.

None means the current cluster (default).

async run(container: Container, *, name: str | None = None, tags: Sequence[str] = (), description: str | None = None, scheduler_enabled: bool = False, pass_config: bool = False, wait_for_jobs_quota: bool = False, schedule_timeout: float | None = None, life_span: float | None = None, priority: JobPriority | None = None) → JobDescription[source]¶

Start a new job.

Deprecated since version 20.11.25: Please use start() instead.

Parameters:

container (Container) – container description to start.
name (str) – optional container name.
name – optional job tags.
description (str) – optional container description.
scheduler_enabled (bool) – a flag that specifies is the job should participate in round-robin scheduling.
pass_config (bool) – a flag that specifies that platform should pass config data to job. This allows to API and CLI from the inside of the job. See Factory.login_with_passed_config() for details.
wait_for_jobs_quota (bool) – when this flag is set, job will wait for another job to stop instead of failing immediately because of total running jobs quota.
schedule_timeout (float) – minimal timeout to wait before reporting that job cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc). This option is not allowed when is_preemptible is set to True.
life_span (float) – job run-time limit in seconds. Pass None to disable.
priority (JobPriority) – priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority. Priority should be supported by cluster.

Returns:

JobDescription instance with information about started job.

async start(*, image: RemoteImage, preset_name: str, cluster_name: str | None = None, org_name: str | None = None, entrypoint: str | None = None, command: str | None = None, working_dir: str | None = None, http: HTTPPort | None = None, env: Mapping[str, str] | None = None, volumes: Sequence[Volume] = (), secret_env: Mapping[str, URL] | None = None, secret_files: Sequence[SecretFile] = (), disk_volumes: Sequence[DiskVolume] = (), tty: bool = False, shm: bool = False, name: str | None = None, tags: Sequence[str] = (), description: str | None = None, pass_config: bool = False, wait_for_jobs_quota: bool = False, schedule_timeout: float | None = None, restart_policy: JobRestartPolicy = JobRestartPolicy.NEVER, life_span: float | None = None, privileged: bool = False, priority: JobPriority | None = None) → JobDescription[source]¶

Start a new job.

Parameters:

image (RemoteImage) – image used for starting a container.
preset_name (str) – name of the preset of resources given to a container on a node.
cluster_name (str) – cluster to start a job. Default is current cluster.
org_name (str) – org to start a job on behalf of. Default is current org.
entrypoint (str) – optional Docker ENTRYPOINT used for overriding image entry-point (str), default None is used to pick entry-point from image’s Dockerfile.
command (str) – optional command line to execute inside a container (str), None for picking command line from image’s Dockerfile.
working_dir (str) – optional working directory inside a container (str), None for picking working directory from image’s Dockerfile.
http (HTTPPort) – optional parameters of HTTP server exposed by container, None if the container doesn’t provide HTTP access.
env (Mapping[str,str]) – optional custom environment variables for pushing into container’s task. A Mapping where keys are environments variables names and values are variable values, both str. None by default.
volumes (Sequence[Volume]) – optional Docker volumes to mount into container, a Sequence of Volume objects. Empty tuple by default.
secret_env (Mapping[str,yarl.URL]) – optional secrets pushed as custom environment variables into container’s task. A Mapping where keys are environments variables names (str) and values are secret URIs (yarl.URL). None by default.
secret_files (Sequence[SecretFile]) – optional secrets mounted as files in a container, a Sequence of SecretFile objects. Empty tuple by default.
disk_volumes (Sequence[DiskVolume]) – optional disk volumes used to mount into container, a Sequence of DiskVolume objects. Empty tuple by default.
tty (bool) – Allocate a TTY or not. False by default.
shm (bool) – Use Linux shared memory or not. False by default.
name (str) – optional job name.
tags (Sequence[str]) – optional job tags.
description (str) – optional container description.
pass_config (bool) – a flag that specifies that platform should pass config data to job. This allows to API and CLI from the inside of the job. See Factory.login_with_passed_config() for details.
wait_for_jobs_quota (bool) – when this flag is set, job will wait for another job to stop instead of failing immediately because of total running jobs quota.
schedule_timeout (float) – minimal timeout to wait before reporting that job cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc).
life_span (float) – job run-time limit in seconds. Pass None to disable.
restart_policy (JobRestartPolicy) – job restart behavior. JobRestartPolicy.NEVER by default.
privileged (bool) – Run job in privileged mode. This mode should be supported by cluster.
priority (JobPriority) – priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority. Priority should be supported by cluster. None by default.

Returns:

JobDescription instance with information about started job.

async send_signal(id: str, *, cluster_name: str | None = None) → None[source]¶

Send SIGKILL signal to a job.

Parameters:

id (str) – job id.
cluster_name (str) –
cluster on which the job is running.

None means the current cluster (default).

async status(id: str) → JobDescription[source]¶

Get information about a job.

Parameters:: id (str) – job id to get its status.
Returns:: JobDescription instance with job status details.

async top(id: str, *, cluster_name: str | None = None) → AsyncContextManager[AsyncIterator[JobTelemetry]][source]¶

Get job usage statistics, e.g.:

async with client.jobs.top(job_id) as top:
    async for data in top:
        print(data.cpu, data.memory)

Parameters:

id (str) – job id to get telemetry data.
cluster_name (str) –
cluster on which the job is running.

None means the current cluster (default).

Returns:

asynchronous iterator which emits JobTelemetry objects periodically.

async bump_life_span(id: str, additional_life_span: float) → None[source]¶

Increase life span of a job.

Parameters:

id (str) – job id to increase life span.
life_span (float) – amount of seconds to add to job run-time limit.

Container¶

class apolo_sdk.Container¶

Read-only dataclass for describing Docker image and environment to run a job.

image¶: RemoteImage used for starting a container.

resources¶: Resources which are used to schedule a container.

entrypoint¶: Docker ENTRYPOINT used for overriding image entry-point (str), default None is used to pick entry-point from image’s Dockerfile.

command¶: Command line to execute inside a container (str), None for picking command line from image’s Dockerfile.

http¶: HTTPPort for describing parameters of HTTP server exposed by container, None if the container doesn’t provide HTTP access.

env¶

Custom environment variables for pushing into container’s task.

A Mapping where keys are environments variables names and values are variable values, both str. Empty dict by default.

volumes¶: Docker volumes to mount into container, a Sequence of Volume objects. Empty list by default.

secret_env¶

Secrets pushed as custom environment variables into container’s task.

A Mapping where keys are environments variables names (str) and values are secret URIs (yarl.URL). Empty dict by default.

secret_files¶: Secrets mounted as files in a container, a Sequence of SecretFile objects. Empty list by default.

disk_volumes¶: Disk volumes used to mount into container, a Sequence of DiskVolume objects. Empty list by default.

HTTPPort¶

class apolo_sdk.HTTPPort¶

Read-only dataclass for exposing HTTP server started in a job.

To access this server from remote machine please use JobDescription.http_url.

port¶: Open port number in container’s port namespace, int.

requires_auth¶: Authentication in Apolo Platform is required for access to exposed HTTP server if True, the port is open publicly otherwise.

JobDescription¶

class apolo_sdk.JobDescription¶

Read-only dataclass for describing a job.

id¶: Job ID, str.

owner¶: A name of user who created a job, str.

cluster_name¶: A name of cluster where job was scheduled, str.

Added in version 19.9.11.

status¶: Current status of job, JobStatus enumeration.

history¶: Additional information about job, e.g. creation time and process exit code. JobStatusHistory instance.

container¶: Description of container information used to start a job, Container instance.

scheduler_enabled¶: Is job participate in round-robin scheduling.

preemptible_node¶: Is this node allows execution on preemptible node. If set to True, the job only allows execution on preemptible nodes. If set to False, the job only allows execution on non-preemptible nodes.

pass_config¶: Is config data is passed by platform, see Factory.login_with_passed_config() for details.

privileged¶: Is the job is running in privileged mode, refer to docker documentation for details.

name¶: Job name provided by user at creation time, str or None if name is omitted.

tags¶: List of job tags provided by user at creation time, Sequence[str] or () if tags omitted.

description¶: Job description text provided by user at creation time, str or None if description is omitted.

http_url¶: yarl.URL for HTTP server exposed by job, empty URL if the job doesn’t expose HTTP server.

ssh_server¶: yarl.URL to access running job by SSH. Internal field, don’t access it from custom code. Use Jobs.exec() and Jobs.port_forward() as official API for accessing to running job.

internal_hostname¶: DNS name to access the running job from other jobs.

internal_hostname_named¶: DNS name to access the running job from other jobs based on jobs name instead of jobs id. Produces same value for jobs with name and owner in same cluster.

life_span¶: Job run-time limit in seconds, float

schedule_timeout¶: Minimal timeout in seconds job will wait before reporting it cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc), float

priority¶: Priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority, JobPriority

_internal¶: Some internal info about job used by platform. Should not be used.

JobRestartPolicy¶

class apolo_sdk.JobRestartPolicy¶

Enumeration that describes job restart behavior.

Can be one of the following statues:

NEVER¶: Job will never be restarted.

ON_FAILURE¶: Job will be restarted only in case of job failure.

ALWAYS¶: Job will always be restarted after success or failure.

JobPriority¶

class apolo_sdk.JobPriority¶

Enumeration that describes job priority.

Can be one of the following statues:

LOW¶: Jobs with LOW priority will start after all other jobs.

NORMAL¶: Default job priority.

HIGH¶: Jobs with HIGH priority will start before all other jobs.

JobStatus¶

class apolo_sdk.JobStatus¶

Enumeration that describes job state.

Can be one of the following statues:

PENDING¶: Job is scheduled for execution but not started yet.

RUNNING¶: Job is running now.

SUSPENDED¶: Scheduled job is paused to allow other jobs to run.

SUCCEEDED¶: Job is finished successfully.

CANCELLED¶: Job was canceled while it was running.

FAILED¶: Job execution is failed.

UNKNOWN¶: Invalid (or unknown) status code, should be never returned from server.

Also some shortcuts are available:

items() → Set[JobStatus][source]¶: Returns all statuses except UNKNOWN.

active_items() → Set[JobStatus][source]¶: Returns all statuses that are not final: PENDING, SUSPENDED and RUNNING.

finished_items() → Set[JobStatus][source]¶: Returns all statuses that are final: SUCCEEDED, CANCELLED and FAILED.

Each enum value has next bool fields:

is_pending¶: Job is waiting to become running. True for PENDING and SUSPENDED states.

is_running¶: Job is running now. True for RUNNING state.

is_finished¶: Job completed execution. True for SUCCEEDED, CANCELLED and FAILED

JobStatusItem¶

class apolo_sdk.JobStatusItem¶

Read-only dataclass for describing job status transition details.

transition_time¶: Status transition timestamp, datetime.

status¶: Status of job after this transition, JobStatus enumeration.

reason¶

Additional information for job status, str.

Examples of reason values:

'ContainerCreating' for JobStatus.PENDING job that initiates a pod for container.
'ErrImagePull' for JobStatus.FAILED job that cannot pull specified image.

description¶: Extended description for short abbreviation described by reason, empty str if no additional information is provided.

exit_code¶: Exit code for container’s process (int) or None if the job was not started or was still running when this transition occurred.

JobStatusHistory¶

class apolo_sdk.JobStatusHistory¶

Read-only dataclass for describing job status details, e.g. creation and finishing time, exit code etc.

status¶

Current status of job, JobStatus enumeration.

The same as JobDescription.status.

reason¶

Additional information for current status, str.

Examples of reason values:

'ContainerCreating' for JobStatus.PENDING job that initiates a pod for container.
'ErrImagePull' for JobStatus.FAILED job that cannot pull specified image.

description¶: Extended description for short abbreviation described by reason, empty str if no additional information is provided.

exit_code¶: Exit code for container’s process (int) or None if the job was not started or is still running.

restarts¶: Number of container’s restarts, int.

created_at¶: Job creation timestamp, datetime or None.

started_at¶: Job starting timestamp, datetime or None if job not started.

finished_at¶: Job finishing timestamp, datetime or None if job not finished.

transitions¶: List of job status transitions, Sequence of JobStatusItem.

JobTelemetry¶

class apolo_sdk.JobTelemetry¶

Read-only dataclass for job telemetry (statistics), e.g. consumed CPU load, memory footprint etc.

Message¶

class apolo_sdk.Message¶

Read-only dataclass for representing job’s stdout/stderr stream chunks, returned from StdStream.read_out().

fileno¶: Stream number, 1 for stdin and 2 for stdout.

data¶: A chunk of stdout/stderr data, bytes.

Resources¶

class apolo_sdk.Resources¶

Read-only dataclass for describing resources (memory, CPU/GPU etc.) available for container, see also Container.resources attribute.

memory_mb¶: Requested memory amount in MegaBytes, int.

cpu¶: Requested number of CPUs, float. Please note, Docker supports fractions here, e.g. 0.5 CPU means a half or CPU on the target node.

gpu¶: The number of requested GPUs, int. Use None for jobs that doesn’t require GPU.

gpu_model¶: The name of requested GPU model, str (or None for job without GPUs).

shm¶: Use Linux shared memory or not, bool. Provide True if you don’t know what /dev/shm device means.

tpu_type¶: Requested TPU type, see also https://en.wikipedia.org/wiki/Tensor_processing_unit

tpu_software_version¶: Requested TPU software version.

StdStream¶

class apolo_sdk.StdStream¶

A class for communicating with attached job (Jobs.attach()) or exec session (Jobs.exec()). Use read_out() for reading from stdout/stderr and write_in() for writing into stdin.

async close() → None[source]¶: Close StdStream instance.

async read_out() → Message | None[source]¶

Volume¶

class apolo_sdk.Volume¶

Read-only dataclass for describing mounted volumes of a container.

storage_uri¶: An URL on remotes storage, yarl.URL.

container_path¶: A path on container filesystem, str.

read_only¶: True is the volume is mounted in read-only mode, False for read-write (default).

SecretFile¶

class apolo_sdk.SecretFile¶

Read-only dataclass for describing secrets mounted as files in a container.

secret_uri¶: An URI on a secret, yarl.URL.

container_path¶: A path on container filesystem, str.

DiskVolume¶

class apolo_sdk.DiskVolume¶

Read-only dataclass for describing mounted disk volumes of a container.

disk_uri¶: An URI on a disk, yarl.URL.

container_path¶: A path on container filesystem, str.

read_only¶: True is the volume is mounted in read-only mode, False for read-write (default).

Jobs API Reference¶

Jobs¶

Container¶

HTTPPort¶

JobDescription¶

JobRestartPolicy¶

JobPriority¶

JobStatus¶

JobStatusItem¶

JobStatusHistory¶

JobTelemetry¶

Message¶

Resources¶

StdStream¶

Volume¶

SecretFile¶

DiskVolume¶

apolo-sdk

Navigation

Related Topics