diff options
Diffstat (limited to 'docs')
| -rw-r--r-- | docs/compute.md | 152 |
1 files changed, 152 insertions, 0 deletions
diff --git a/docs/compute.md b/docs/compute.md new file mode 100644 index 000000000..417622f94 --- /dev/null +++ b/docs/compute.md @@ -0,0 +1,152 @@ +# DDC compute interface design documentation + +This is a work in progress + +## General architecture + +The Zen server compute interfaces implement a basic model for distributing compute processes. +Clients can implement [Functions](#functions) in [worker executables](#workers) and dispatch +[actions](#actions) to them via a message based interface. + +The API requires users to describe the actions and the workers explicitly fully up front and the +work is described and submitted as singular objects to the compute service. The model somewhat +resembles Lambda and other stateless compute services but is more tightly constrained to allow +for optimizations and to integrate tightly with the storage components in Zen server. + +This is in contrast with Unreal Build Accelerator in where the worker (remote process) +and the inputs are discovered on-the-fly as the worker progresses and inputs and results +are communicated via relatively high-frequency RPCs. + +### Actions + +An action is described by an action descriptor, which is a compact binary object which +contains a self-contained description of the inputs and the function which should be applied +to generate an output. + +#### Sample Action Descriptor + +``` +work item 4857714dee2383b50b2e7d72afd79848ab5d13f8 (2 attachments): +Function: CompileShaderJobs +FunctionVersion: '83027356-2cf7-41ca-aba5-c81ab0ff2129' +BuildSystemVersion: '17fe280d-ccd8-4be8-a9d1-89c944a70969' +Inputs: + Input: + RawHash: 0c01d9f19033256ca974fced523d1e15b27c1b0a + RawSize: 4482 + Virtual0: + RawHash: dd9bbcb8763badd2f015f94f8f6e360362e2bce0 + RawSize: 3334 +``` + +### Functions + +Functions are identified by a name, and a version specification. For +matching purposes there's also a build system version specification. +When workers are registered with the compute service, they are entered +into a table and as actions stream in the compute subsystem will try to +find a worker which implements the required function using the +`[Function,FunctionVersion,BuildSystemVersion]` tuple. In practice there +may be more than one matching worker and it's up to the compute service +to pick one. + +``` +=== Known functions =========================== +function version build system worker id +CompileShaderJobs 83027356-2cf7-41ca-aba5-c81ab0ff2129 17fe280d-ccd8-4be8-a9d1-89c944a70969 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a +``` + +### Workers + +A worker is an executable which accepts some command line options which are used to pass the +information required to execute an action. There are two modes, one legacy mode which is +file-based and a streaming mode. + +In the file-based mode the option is simply `-Build=<action file>` which points to an action +descriptor in compact binary format (see above). By convention, the referenced inputs are in a folder +named `Inputs` where any input blobs are stored as `CompressedBuffer`-format files named +after the `IoHash` of the uncompressed contents. + +In the streaming mode, the data is provided through a streaming socket interface instead +of using the file system. This eliminates process spawning overheads and enables intra-process +pipelining for greater efficiency. The streaming mode is not yet implemented fully. + +### Worker Descriptors + +Workers are declared by passing a worker descriptor to the compute service. The descriptor +contains information about which executable files are required to execute the worker and how +they need to be laid out. You can optionally also provide additional non-executable files to +go along with the executables. + +The descriptor also lists the functions implemented by the worker. Each function defines +a version which is used when matching actions (the function version is passed in as the +`FunctionVersion` in the action descriptor). + +Each worker links in a small set of common support code which is used to handle the +communication with the invoking program (the 'build system'). To be able to evolve this +interface, each worker also indicates the version of the build system using the +`BuildSystemVersion` attribute. + +#### Sample Worker Descriptor + +``` +worker 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a: +name: ShaderBuildWorker +path: Engine/Binaries/Win64/ShaderBuildWorker.exe +host: Win64 +buildsystem_version: '17fe280d-ccd8-4be8-a9d1-89c944a70969' +timeout: 300 +cores: 1 +environment: [] +executables: + - name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataBuildWorker.dll' + hash: f4dbec80e549bae2916288f1b9428c2878d9ae7a + size: 166912 + - name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataCache.dll' + hash: 8025d561ede05db19b235fc2ef290e2b029c1b8c + size: 4339200 + - name: Engine/Binaries/Win64/ShaderBuildWorker.exe + hash: b85862fca2ce04990470f27bae9ead7f31d9b27e + size: 60928 + - name: Engine/Binaries/Win64/ShaderBuildWorker.modules + hash: 7b05741a69a2ea607c5578668a8de50b04259668 + size: 3739 + - name: Engine/Binaries/Win64/ShaderBuildWorker.version + hash: 8fdfd9f825febf2191b555393e69b32a1d78c24f + size: 259 +files: [] +dirs: + - Engine/Binaries/Win64 +functions: + - name: CompileShaderJobs + version: '83027356-2cf7-41ca-aba5-c81ab0ff2129' +``` + +## API (WIP not final) + +The compute interfaces are currently exposed on the `/apply` endpoint but this +will be subject to change as we adapt the interfaces during development. The LSN +APIs below are intended to replace the action ID oriented APIs. + +The POST APIs typically involve a two-step dance where a descriptor is POSTed and +the service responds with a list of `needs` chunks (identified via `IoHash`) which +it does not have yet. The client can then follow up with a POST of a Compact Binary +Package containing the descriptor along with the needed chunks. + +`/apply/ready` - health check endpoint returns HTTP 200 OK or HTTP 503 + +`/apply/sysinfo` - system information endpoint + +`/apply/record/start`, `/apply/record/stop` - start/stop action recording + +`/apply/workers/{worker}` - GET/POST worker descriptors and payloads + +`/apply/jobs/completed` - GET list of completed actions + +`/apply/jobs/{lsn}` - GET completed action results from LSN, POST action cancellation by LSN, priority changes by LSN + +`/apply/jobs/{worker}/{action}` - GET completed action (job) results by action ID + +`/apply/jobs/{worker}` - GET pending/running jobs for worker, POST requests to schedule action as a job + +`/apply/jobs` - POST request to schedule action as a job |