1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
|
# DDC compute interface design documentation
This is a work in progress
## General architecture
The Zen server compute interfaces implement a basic model for distributing compute processes.
Clients can implement [Functions](#functions) in [worker executables](#workers) and dispatch
[actions](#actions) to them via a message based interface.
The API requires users to describe the actions and the workers explicitly fully up front and the
work is described and submitted as singular objects to the compute service. The model somewhat
resembles Lambda and other stateless compute services but is more tightly constrained to allow
for optimizations and to integrate tightly with the storage components in Zen server.
This is in contrast with Unreal Build Accelerator in where the worker (remote process)
and the inputs are discovered on-the-fly as the worker progresses and inputs and results
are communicated via relatively high-frequency RPCs.
### Actions
An action is described by an action descriptor, which is a compact binary object which
contains a self-contained description of the inputs and the function which should be applied
to generate an output.
#### Sample Action Descriptor
```
work item 4857714dee2383b50b2e7d72afd79848ab5d13f8 (2 attachments):
Function: CompileShaderJobs
FunctionVersion: '83027356-2cf7-41ca-aba5-c81ab0ff2129'
BuildSystemVersion: '17fe280d-ccd8-4be8-a9d1-89c944a70969'
Inputs:
Input:
RawHash: 0c01d9f19033256ca974fced523d1e15b27c1b0a
RawSize: 4482
Virtual0:
RawHash: dd9bbcb8763badd2f015f94f8f6e360362e2bce0
RawSize: 3334
```
### Functions
Functions are identified by a name, and a version specification. For
matching purposes there's also a build system version specification.
When workers are registered with the compute service, they are entered
into a table and as actions stream in the compute subsystem will try to
find a worker which implements the required function using the
`[Function,FunctionVersion,BuildSystemVersion]` tuple. In practice there
may be more than one matching worker and it's up to the compute service
to pick one.
```
=== Known functions ===========================
function version build system worker id
CompileShaderJobs 83027356-2cf7-41ca-aba5-c81ab0ff2129 17fe280d-ccd8-4be8-a9d1-89c944a70969 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a
```
### Workers
A worker is an executable which accepts some command line options which are used to pass the
information required to execute an action. There are two modes, one legacy mode which is
file-based and a streaming mode.
In the file-based mode the option is simply `-Build=<action file>` which points to an action
descriptor in compact binary format (see above). By convention, the referenced inputs are in a folder
named `Inputs` where any input blobs are stored as `CompressedBuffer`-format files named
after the `IoHash` of the uncompressed contents.
In the streaming mode, the data is provided through a streaming socket interface instead
of using the file system. This eliminates process spawning overheads and enables intra-process
pipelining for greater efficiency. The streaming mode is not yet implemented fully.
### Worker Descriptors
Workers are declared by passing a worker descriptor to the compute service. The descriptor
contains information about which executable files are required to execute the worker and how
they need to be laid out. You can optionally also provide additional non-executable files to
go along with the executables.
The descriptor also lists the functions implemented by the worker. Each function defines
a version which is used when matching actions (the function version is passed in as the
`FunctionVersion` in the action descriptor).
Each worker links in a small set of common support code which is used to handle the
communication with the invoking program (the 'build system'). To be able to evolve this
interface, each worker also indicates the version of the build system using the
`BuildSystemVersion` attribute.
#### Sample Worker Descriptor
```
worker 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a:
name: ShaderBuildWorker
path: Engine/Binaries/Win64/ShaderBuildWorker.exe
host: Win64
buildsystem_version: '17fe280d-ccd8-4be8-a9d1-89c944a70969'
timeout: 300
cores: 1
environment: []
executables:
- name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataBuildWorker.dll'
hash: f4dbec80e549bae2916288f1b9428c2878d9ae7a
size: 166912
- name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataCache.dll'
hash: 8025d561ede05db19b235fc2ef290e2b029c1b8c
size: 4339200
- name: Engine/Binaries/Win64/ShaderBuildWorker.exe
hash: b85862fca2ce04990470f27bae9ead7f31d9b27e
size: 60928
- name: Engine/Binaries/Win64/ShaderBuildWorker.modules
hash: 7b05741a69a2ea607c5578668a8de50b04259668
size: 3739
- name: Engine/Binaries/Win64/ShaderBuildWorker.version
hash: 8fdfd9f825febf2191b555393e69b32a1d78c24f
size: 259
files: []
dirs:
- Engine/Binaries/Win64
functions:
- name: CompileShaderJobs
version: '83027356-2cf7-41ca-aba5-c81ab0ff2129'
```
## API (WIP not final)
The compute interfaces are currently exposed on the `/apply` endpoint but this
will be subject to change as we adapt the interfaces during development. The LSN
APIs below are intended to replace the action ID oriented APIs.
The POST APIs typically involve a two-step dance where a descriptor is POSTed and
the service responds with a list of `needs` chunks (identified via `IoHash`) which
it does not have yet. The client can then follow up with a POST of a Compact Binary
Package containing the descriptor along with the needed chunks.
`/apply/ready` - health check endpoint returns HTTP 200 OK or HTTP 503
`/apply/sysinfo` - system information endpoint
`/apply/record/start`, `/apply/record/stop` - start/stop action recording
`/apply/workers/{worker}` - GET/POST worker descriptors and payloads
`/apply/jobs/completed` - GET list of completed actions
`/apply/jobs/{lsn}` - GET completed action results from LSN, POST action cancellation by LSN, priority changes by LSN
`/apply/jobs/{worker}/{action}` - GET completed action (job) results by action ID
`/apply/jobs/{worker}` - GET pending/running jobs for worker, POST requests to schedule action as a job
`/apply/jobs` - POST request to schedule action as a job
|