aboutsummaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
Diffstat (limited to 'docs')
-rw-r--r--docs/compute.md152
1 files changed, 152 insertions, 0 deletions
diff --git a/docs/compute.md b/docs/compute.md
new file mode 100644
index 000000000..417622f94
--- /dev/null
+++ b/docs/compute.md
@@ -0,0 +1,152 @@
+# DDC compute interface design documentation
+
+This is a work in progress
+
+## General architecture
+
+The Zen server compute interfaces implement a basic model for distributing compute processes.
+Clients can implement [Functions](#functions) in [worker executables](#workers) and dispatch
+[actions](#actions) to them via a message based interface.
+
+The API requires users to describe the actions and the workers explicitly fully up front and the
+work is described and submitted as singular objects to the compute service. The model somewhat
+resembles Lambda and other stateless compute services but is more tightly constrained to allow
+for optimizations and to integrate tightly with the storage components in Zen server.
+
+This is in contrast with Unreal Build Accelerator in where the worker (remote process)
+and the inputs are discovered on-the-fly as the worker progresses and inputs and results
+are communicated via relatively high-frequency RPCs.
+
+### Actions
+
+An action is described by an action descriptor, which is a compact binary object which
+contains a self-contained description of the inputs and the function which should be applied
+to generate an output.
+
+#### Sample Action Descriptor
+
+```
+work item 4857714dee2383b50b2e7d72afd79848ab5d13f8 (2 attachments):
+Function: CompileShaderJobs
+FunctionVersion: '83027356-2cf7-41ca-aba5-c81ab0ff2129'
+BuildSystemVersion: '17fe280d-ccd8-4be8-a9d1-89c944a70969'
+Inputs:
+ Input:
+ RawHash: 0c01d9f19033256ca974fced523d1e15b27c1b0a
+ RawSize: 4482
+ Virtual0:
+ RawHash: dd9bbcb8763badd2f015f94f8f6e360362e2bce0
+ RawSize: 3334
+```
+
+### Functions
+
+Functions are identified by a name, and a version specification. For
+matching purposes there's also a build system version specification.
+When workers are registered with the compute service, they are entered
+into a table and as actions stream in the compute subsystem will try to
+find a worker which implements the required function using the
+`[Function,FunctionVersion,BuildSystemVersion]` tuple. In practice there
+may be more than one matching worker and it's up to the compute service
+to pick one.
+
+```
+=== Known functions ===========================
+function version build system worker id
+CompileShaderJobs 83027356-2cf7-41ca-aba5-c81ab0ff2129 17fe280d-ccd8-4be8-a9d1-89c944a70969 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a
+```
+
+### Workers
+
+A worker is an executable which accepts some command line options which are used to pass the
+information required to execute an action. There are two modes, one legacy mode which is
+file-based and a streaming mode.
+
+In the file-based mode the option is simply `-Build=<action file>` which points to an action
+descriptor in compact binary format (see above). By convention, the referenced inputs are in a folder
+named `Inputs` where any input blobs are stored as `CompressedBuffer`-format files named
+after the `IoHash` of the uncompressed contents.
+
+In the streaming mode, the data is provided through a streaming socket interface instead
+of using the file system. This eliminates process spawning overheads and enables intra-process
+pipelining for greater efficiency. The streaming mode is not yet implemented fully.
+
+### Worker Descriptors
+
+Workers are declared by passing a worker descriptor to the compute service. The descriptor
+contains information about which executable files are required to execute the worker and how
+they need to be laid out. You can optionally also provide additional non-executable files to
+go along with the executables.
+
+The descriptor also lists the functions implemented by the worker. Each function defines
+a version which is used when matching actions (the function version is passed in as the
+`FunctionVersion` in the action descriptor).
+
+Each worker links in a small set of common support code which is used to handle the
+communication with the invoking program (the 'build system'). To be able to evolve this
+interface, each worker also indicates the version of the build system using the
+`BuildSystemVersion` attribute.
+
+#### Sample Worker Descriptor
+
+```
+worker 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a:
+name: ShaderBuildWorker
+path: Engine/Binaries/Win64/ShaderBuildWorker.exe
+host: Win64
+buildsystem_version: '17fe280d-ccd8-4be8-a9d1-89c944a70969'
+timeout: 300
+cores: 1
+environment: []
+executables:
+ - name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataBuildWorker.dll'
+ hash: f4dbec80e549bae2916288f1b9428c2878d9ae7a
+ size: 166912
+ - name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataCache.dll'
+ hash: 8025d561ede05db19b235fc2ef290e2b029c1b8c
+ size: 4339200
+ - name: Engine/Binaries/Win64/ShaderBuildWorker.exe
+ hash: b85862fca2ce04990470f27bae9ead7f31d9b27e
+ size: 60928
+ - name: Engine/Binaries/Win64/ShaderBuildWorker.modules
+ hash: 7b05741a69a2ea607c5578668a8de50b04259668
+ size: 3739
+ - name: Engine/Binaries/Win64/ShaderBuildWorker.version
+ hash: 8fdfd9f825febf2191b555393e69b32a1d78c24f
+ size: 259
+files: []
+dirs:
+ - Engine/Binaries/Win64
+functions:
+ - name: CompileShaderJobs
+ version: '83027356-2cf7-41ca-aba5-c81ab0ff2129'
+```
+
+## API (WIP not final)
+
+The compute interfaces are currently exposed on the `/apply` endpoint but this
+will be subject to change as we adapt the interfaces during development. The LSN
+APIs below are intended to replace the action ID oriented APIs.
+
+The POST APIs typically involve a two-step dance where a descriptor is POSTed and
+the service responds with a list of `needs` chunks (identified via `IoHash`) which
+it does not have yet. The client can then follow up with a POST of a Compact Binary
+Package containing the descriptor along with the needed chunks.
+
+`/apply/ready` - health check endpoint returns HTTP 200 OK or HTTP 503
+
+`/apply/sysinfo` - system information endpoint
+
+`/apply/record/start`, `/apply/record/stop` - start/stop action recording
+
+`/apply/workers/{worker}` - GET/POST worker descriptors and payloads
+
+`/apply/jobs/completed` - GET list of completed actions
+
+`/apply/jobs/{lsn}` - GET completed action results from LSN, POST action cancellation by LSN, priority changes by LSN
+
+`/apply/jobs/{worker}/{action}` - GET completed action (job) results by action ID
+
+`/apply/jobs/{worker}` - GET pending/running jobs for worker, POST requests to schedule action as a job
+
+`/apply/jobs` - POST request to schedule action as a job