Jobs API Reference¶
Jobs¶
- class apolo_sdk.Jobs¶
Jobs subsystem, available as
Client.jobs
.User can start new job, terminate it, get status, list running jobs etc.
- async attach(id: str, *, tty: bool = False, stdin: bool = False, stdout: bool = False, stderr: bool = False, cluster_name: str | None = None) AsyncContextManager[StdStream] [source]¶
Get access to standard input, output, and error streams of a running job.
- Parameters:
tty (bool) –
True
if tty mode is requested, default isFalse
.stdin (bool) –
True
to attach stdin, default isFalse
.stdout (bool) –
True
to attach stdout, default isFalse
.stderr (bool) –
True
to attach stderr, default isFalse
.cluster_name (str) –
cluster on which the job is running.
None
means the current cluster (default).
- Returns:
Asynchronous context manager which can be used to access stdin/stdout/stderr, see
StdStream
for details.
- async exec(id: str, cmd: str, *, tty: bool = False, stdin: bool = False, stdout: bool = False, stderr: bool = False, cluster_name: str | None = None) AsyncContextManager[StdStream] [source]¶
Start an exec session, get access to session’s standard input, output, and error streams.
- Parameters:
cmd (str) – the command to execute.
tty (bool) –
True
if tty mode is requested, default isFalse
.stdin (bool) –
True
to attach stdin, default isFalse
.stdout (bool) –
True
to attach stdout, default isFalse
.stderr (bool) –
True
to attach stderr, default isFalse
.cluster_name (str) –
cluster on which the job is running.
None
means the current cluster (default).
- Returns:
Asynchronous context manager which can be used to access stdin/stdout/stderr, see
StdStream
for details.
- async get_capacity(*, cluster_name: str | None = None) Mapping[str, int] [source]¶
Get counts of available job for specified cluster for each available preset.
The returned numbers reflect the remaining cluster capacity. In other words, it displays how many concurrent jobs for each preset can be started at the moment of the method call.
The returned capacity is an approximation, the real value can differ if already running jobs are finished or another user starts own jobs at the same time.
- Parameters:
cluster_name (str) –
cluster for which the request is performed.
None
means the current cluster (default).- Returns:
A mapping of preset_name to count, where count is a number of concurrent jobs that can be executed using preset_name.
- async list(*, statuses: Iterable[JobStatus] = (), name: str | None = None, tags: Sequence[str] = (), owners: Iterable[str] = (), since: datetime | None = None, until: datetime | None = None, reverse: bool = False, limit: int | None = None, cluster_name: str | None = None) AsyncContextManager[AsyncIterator[JobDescription]] [source]¶
List user jobs, all scheduled, running and finished jobs by default.
- Parameters:
statuses (Iterable[JobStatus]) –
filter jobs by their statuses.
The parameter can be a set or list of requested statuses, e.g.
{JobStatus.PENDIND, JobStatus.RUNNING}
can be used for requesting only scheduled and running job but skip finished and failed ones.Empty sequence means that jobs with all statuses are returned (default behavior). The list can be pretty huge though.
name (str) –
Filter jobs by
name
(exact match).Empty string or
None
means that no filter is applied (default).filter jobs by
tags
.Retrieves only jobs submitted with all tags from the specified list.
Empty list means that no filter is applied (default).
filter jobs by their owners.
The parameter can be a set or list of owner usernames (see
JobDescription.owner
for details).No owners filter is applied if the iterable is empty.
since (datetime) –
filter jobs by their creation date.
Retrieves only jobs submitted after the specified date (including) if it is not
None
. If the parameter is a naive datetime object, it represents local time.None
means that no filter is applied (default).until (datetime) –
filter jobs by their creation date.
Retrieves only jobs submitted before the specified date (including) if it is not
None
. If the parameter is a naive datetime object, it represents local time.None
means that no filter is applied (default).reverse (bool) –
iterate jobs in the reverse order.
If reverse is false (default) the jobs are iterated in the order of their creation date, from earlier to later. If reverse is true, they are iterated in the reverse order, from later to earlier.
limit (int) –
limit the number of jobs.
None
means no limit (default).cluster_name (str) –
list jobs on specified cluster.
None
means the current cluster (default).
- Returns:
asynchronous iterator which emits
JobDescription
objects.
- monitor(id: str, *, cluster_name: Optional[str] = None, since: Optional[datetime] = None,
- timestamps: bool = False,
- separator: Optional[str] = None,
- ) -> AsyncContextManager[AsyncIterator[bytes]]
Get job logs as a sequence of data chunks, e.g.:
async with client.jobs.monitor(job_id) as it: async for chunk in it: print(chunk.encode('utf8', errors='replace')
- Parameters:
cluster_name (str) –
cluster on which the job is running.
None
means the current cluster (default).since (datetime) –
Retrieves only logs after the specified date (including) if it is not
None
. If the parameter is a naive datetime object, it represents local time.None
means that no filter is applied (default).timestamps (bool) – if true, include timestamps on each line in the log output.
separator (str) –
string which will separate archive and live logs (if both parts are present).
By default a string containing random characters are used. Empty separator suppresses output of separator.
- Returns:
AsyncIterator
overbytes
log chunks.
- async port_forward(id: str, local_port: int, job_port: int, *, no_key_check: bool = False, cluster_name: str | None = None) None [source]¶
Forward local port to job, e.g.:
async with client.jobs.port_forward(job_id, 8080, 80): # port forwarding is awailable inside with-block
- async run(container: Container, *, name: str | None = None, tags: Sequence[str] = (), description: str | None = None, scheduler_enabled: bool = False, pass_config: bool = False, wait_for_jobs_quota: bool = False, schedule_timeout: float | None = None, life_span: float | None = None, priority: JobPriority | None = None) JobDescription [source]¶
Start a new job.
Deprecated since version 20.11.25: Please use
start()
instead.- Parameters:
container (Container) – container description to start.
name (str) – optional container name.
name – optional job tags.
description (str) – optional container description.
scheduler_enabled (bool) – a flag that specifies is the job should participate in round-robin scheduling.
pass_config (bool) – a flag that specifies that platform should pass config data to job. This allows to API and CLI from the inside of the job. See
Factory.login_with_passed_config()
for details.wait_for_jobs_quota (bool) – when this flag is set, job will wait for another job to stop instead of failing immediately because of total running jobs quota.
schedule_timeout (float) – minimal timeout to wait before reporting that job cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc). This option is not allowed when
is_preemptible
is set toTrue
.life_span (float) – job run-time limit in seconds. Pass None to disable.
priority (JobPriority) – priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority. Priority should be supported by cluster.
- Returns:
JobDescription
instance with information about started job.
- async start(*, image: RemoteImage, preset_name: str, cluster_name: str | None = None, org_name: str | None = None, entrypoint: str | None = None, command: str | None = None, working_dir: str | None = None, http: HTTPPort | None = None, env: Mapping[str, str] | None = None, volumes: Sequence[Volume] = (), secret_env: Mapping[str, URL] | None = None, secret_files: Sequence[SecretFile] = (), disk_volumes: Sequence[DiskVolume] = (), tty: bool = False, shm: bool = False, name: str | None = None, tags: Sequence[str] = (), description: str | None = None, pass_config: bool = False, wait_for_jobs_quota: bool = False, schedule_timeout: float | None = None, restart_policy: JobRestartPolicy = JobRestartPolicy.NEVER, life_span: float | None = None, privileged: bool = False, priority: JobPriority | None = None) JobDescription [source]¶
Start a new job.
- Parameters:
image (RemoteImage) – image used for starting a container.
preset_name (str) – name of the preset of resources given to a container on a node.
cluster_name (str) – cluster to start a job. Default is current cluster.
org_name (str) – org to start a job on behalf of. Default is current org.
entrypoint (str) – optional Docker ENTRYPOINT used for overriding image entry-point (
str
), defaultNone
is used to pick entry-point from image’s Dockerfile.command (str) – optional command line to execute inside a container (
str
),None
for picking command line from image’s Dockerfile.working_dir (str) – optional working directory inside a container (
str
),None
for picking working directory from image’s Dockerfile.http (HTTPPort) – optional parameters of HTTP server exposed by container,
None
if the container doesn’t provide HTTP access.env (Mapping[str,str]) – optional custom environment variables for pushing into container’s task. A
Mapping
where keys are environments variables names and values are variable values, bothstr
.None
by default.volumes (Sequence[Volume]) – optional Docker volumes to mount into container, a
Sequence
ofVolume
objects. Emptytuple
by default.secret_env (Mapping[str,yarl.URL]) – optional secrets pushed as custom environment variables into container’s task. A
Mapping
where keys are environments variables names (str
) and values are secret URIs (yarl.URL
).None
by default.secret_files (Sequence[SecretFile]) – optional secrets mounted as files in a container, a
Sequence
ofSecretFile
objects. Emptytuple
by default.disk_volumes (Sequence[DiskVolume]) – optional disk volumes used to mount into container, a
Sequence
ofDiskVolume
objects. Emptytuple
by default.tty (bool) – Allocate a TTY or not.
False
by default.shm (bool) – Use Linux shared memory or not.
False
by default.name (str) – optional job name.
description (str) – optional container description.
pass_config (bool) – a flag that specifies that platform should pass config data to job. This allows to API and CLI from the inside of the job. See
Factory.login_with_passed_config()
for details.wait_for_jobs_quota (bool) – when this flag is set, job will wait for another job to stop instead of failing immediately because of total running jobs quota.
schedule_timeout (float) – minimal timeout to wait before reporting that job cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc).
life_span (float) – job run-time limit in seconds. Pass None to disable.
restart_policy (JobRestartPolicy) – job restart behavior.
JobRestartPolicy
.NEVER by default.privileged (bool) – Run job in privileged mode. This mode should be supported by cluster.
priority (JobPriority) – priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority. Priority should be supported by cluster.
None
by default.
- Returns:
JobDescription
instance with information about started job.
- async send_signal(id: str, *, cluster_name: str | None = None) None [source]¶
Send
SIGKILL
signal to a job.
- async status(id: str) JobDescription [source]¶
Get information about a job.
- Parameters:
- Returns:
JobDescription
instance with job status details.
Container¶
- class apolo_sdk.Container¶
Read-only
dataclass
for describing Docker image and environment to run a job.- image¶
RemoteImage
used for starting a container.
- entrypoint¶
Docker ENTRYPOINT used for overriding image entry-point (
str
), defaultNone
is used to pick entry-point from image’s Dockerfile.
- command¶
Command line to execute inside a container (
str
),None
for picking command line from image’s Dockerfile.
- http¶
HTTPPort
for describing parameters of HTTP server exposed by container,None
if the container doesn’t provide HTTP access.
- env¶
Custom environment variables for pushing into container’s task.
A
Mapping
where keys are environments variables names and values are variable values, bothstr
. Emptydict
by default.
- volumes¶
Docker volumes to mount into container, a
Sequence
ofVolume
objects. Emptylist
by default.
- secret_env¶
Secrets pushed as custom environment variables into container’s task.
A
Mapping
where keys are environments variables names (str
) and values are secret URIs (yarl.URL
). Emptydict
by default.
- secret_files¶
Secrets mounted as files in a container, a
Sequence
ofSecretFile
objects. Emptylist
by default.
- disk_volumes¶
Disk volumes used to mount into container, a
Sequence
ofDiskVolume
objects. Emptylist
by default.
HTTPPort¶
- class apolo_sdk.HTTPPort¶
Read-only
dataclass
for exposing HTTP server started in a job.To access this server from remote machine please use
JobDescription.http_url
.- requires_auth¶
Authentication in Apolo Platform is required for access to exposed HTTP server if
True
, the port is open publicly otherwise.
JobDescription¶
- class apolo_sdk.JobDescription¶
Read-only
dataclass
for describing a job.- history¶
Additional information about job, e.g. creation time and process exit code.
JobStatusHistory
instance.
- scheduler_enabled¶
Is job participate in round-robin scheduling.
- preemptible_node¶
Is this node allows execution on preemptible node. If set to
True
, the job only allows execution on preemptible nodes. If set toFalse
, the job only allows execution on non-preemptible nodes.
- pass_config¶
Is config data is passed by platform, see
Factory.login_with_passed_config()
for details.
- privileged¶
Is the job is running in privileged mode, refer to docker documentation for details.
- tags¶
List of job tags provided by user at creation time,
Sequence[str]
or()
if tags omitted.
- description¶
Job description text provided by user at creation time,
str
orNone
if description is omitted.
- ssh_server¶
yarl.URL
to access running job by SSH. Internal field, don’t access it from custom code. UseJobs.exec()
andJobs.port_forward()
as official API for accessing to running job.
- internal_hostname¶
DNS name to access the running job from other jobs.
- internal_hostname_named¶
DNS name to access the running job from other jobs based on jobs name instead of jobs id. Produces same value for jobs with
name
andowner
in same cluster.
- schedule_timeout¶
Minimal timeout in seconds job will wait before reporting it cannot be scheduled because the lack of computation cluster resources (memory, CPU/GPU etc),
float
- priority¶
Priority used to specify job’s start order. Jobs with higher priority will start before ones with lower priority,
JobPriority
- _internal¶
Some internal info about job used by platform. Should not be used.
JobRestartPolicy¶
JobPriority¶
JobStatus¶
- class apolo_sdk.JobStatus¶
Enumeration that describes job state.
Can be one of the following statues:
- PENDING¶
Job is scheduled for execution but not started yet.
- RUNNING¶
Job is running now.
- SUSPENDED¶
Scheduled job is paused to allow other jobs to run.
- SUCCEEDED¶
Job is finished successfully.
- CANCELLED¶
Job was canceled while it was running.
- FAILED¶
Job execution is failed.
- UNKNOWN¶
Invalid (or unknown) status code, should be never returned from server.
Also some shortcuts are available:
- active_items() Set[JobStatus] [source]¶
Returns all statuses that are not final:
PENDING
,SUSPENDED
andRUNNING
.
- finished_items() Set[JobStatus] [source]¶
Returns all statuses that are final:
SUCCEEDED
,CANCELLED
andFAILED
.
Each enum value has next
bool
fields:
JobStatusItem¶
- class apolo_sdk.JobStatusItem¶
Read-only
dataclass
for describing job status transition details.- reason¶
Additional information for job status,
str
.Examples of reason values:
'ContainerCreating'
forJobStatus.PENDING
job that initiates a pod for container.'ErrImagePull'
forJobStatus.FAILED
job that cannot pull specified image.
JobStatusHistory¶
- class apolo_sdk.JobStatusHistory¶
Read-only
dataclass
for describing job status details, e.g. creation and finishing time, exit code etc.- status¶
Current status of job,
JobStatus
enumeration.The same as
JobDescription.status
.
- reason¶
Additional information for current status,
str
.Examples of reason values:
'ContainerCreating'
forJobStatus.PENDING
job that initiates a pod for container.'ErrImagePull'
forJobStatus.FAILED
job that cannot pull specified image.
- description¶
Extended description for short abbreviation described by
reason
, emptystr
if no additional information is provided.
- exit_code¶
Exit code for container’s process (
int
) orNone
if the job was not started or is still running.
- transitions¶
List of job status transitions,
Sequence
ofJobStatusItem
.
JobTelemetry¶
- class apolo_sdk.JobTelemetry¶
Read-only
dataclass
for job telemetry (statistics), e.g. consumed CPU load, memory footprint etc.See also
- timestamp¶
Date and time of telemetry report (
float
), time in seconds since the epoch, like the value returned fromtime.time()
.See
time
anddatetime
for more information how to handle the timestamp.
Message¶
- class apolo_sdk.Message¶
Read-only
dataclass
for representing job’s stdout/stderr stream chunks, returned fromStdStream.read_out()
.- fileno¶
Stream number, 1 for stdin and 2 for stdout.
Resources¶
- class apolo_sdk.Resources¶
Read-only
dataclass
for describing resources (memory, CPU/GPU etc.) available for container, see alsoContainer.resources
attribute.- cpu¶
Requested number of CPUs,
float
. Please note, Docker supports fractions here, e.g.0.5
CPU means a half or CPU on the target node.
- shm¶
Use Linux shared memory or not,
bool
. ProvideTrue
if you don’t know what/dev/shm
device means.
- tpu_type¶
Requested TPU type, see also https://en.wikipedia.org/wiki/Tensor_processing_unit
- tpu_software_version¶
Requested TPU software version.
StdStream¶
- class apolo_sdk.StdStream¶
A class for communicating with attached job (
Jobs.attach()
) or exec session (Jobs.exec()
). Useread_out()
for reading from stdout/stderr andwrite_in()
for writing into stdin.