# DDC compute interface design documentation This is a work in progress ## General architecture The Zen server compute interfaces implement a basic model for distributing compute processes. Clients can implement [Functions](#functions) in [worker executables](#workers) and dispatch [actions](#actions) to them via a message based interface. The API requires users to describe the actions and the workers explicitly fully up front and the work is described and submitted as singular objects to the compute service. The model somewhat resembles Lambda and other stateless compute services but is more tightly constrained to allow for optimizations and to integrate tightly with the storage components in Zen server. This is in contrast with Unreal Build Accelerator in where the worker (remote process) and the inputs are discovered on-the-fly as the worker progresses and inputs and results are communicated via relatively high-frequency RPCs. ### Actions An action is described by an action descriptor, which is a compact binary object which contains a self-contained description of the inputs and the function which should be applied to generate an output. #### Sample Action Descriptor ``` work item 4857714dee2383b50b2e7d72afd79848ab5d13f8 (2 attachments): Function: CompileShaderJobs FunctionVersion: '83027356-2cf7-41ca-aba5-c81ab0ff2129' BuildSystemVersion: '17fe280d-ccd8-4be8-a9d1-89c944a70969' Inputs: Input: RawHash: 0c01d9f19033256ca974fced523d1e15b27c1b0a RawSize: 4482 Virtual0: RawHash: dd9bbcb8763badd2f015f94f8f6e360362e2bce0 RawSize: 3334 ``` ### Functions Functions are identified by a name, and a version specification. For matching purposes there's also a build system version specification. When workers are registered with the compute service, they are entered into a table and as actions stream in the compute subsystem will try to find a worker which implements the required function using the `[Function,FunctionVersion,BuildSystemVersion]` tuple. In practice there may be more than one matching worker and it's up to the compute service to pick one. ``` === Known functions =========================== function version build system worker id CompileShaderJobs 83027356-2cf7-41ca-aba5-c81ab0ff2129 17fe280d-ccd8-4be8-a9d1-89c944a70969 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a ``` ### Workers A worker is an executable which accepts some command line options which are used to pass the information required to execute an action. There are two modes, one legacy mode which is file-based and a streaming mode. In the file-based mode the option is simply `-Build=` which points to an action descriptor in compact binary format (see above). By convention, the referenced inputs are in a folder named `Inputs` where any input blobs are stored as `CompressedBuffer`-format files named after the `IoHash` of the uncompressed contents. In the streaming mode, the data is provided through a streaming socket interface instead of using the file system. This eliminates process spawning overheads and enables intra-process pipelining for greater efficiency. The streaming mode is not yet implemented fully. ### Worker Descriptors Workers are declared by passing a worker descriptor to the compute service. The descriptor contains information about which executable files are required to execute the worker and how they need to be laid out. You can optionally also provide additional non-executable files to go along with the executables. The descriptor also lists the functions implemented by the worker. Each function defines a version which is used when matching actions (the function version is passed in as the `FunctionVersion` in the action descriptor). Each worker links in a small set of common support code which is used to handle the communication with the invoking program (the 'build system'). To be able to evolve this interface, each worker also indicates the version of the build system using the `BuildSystemVersion` attribute. #### Sample Worker Descriptor ``` worker 69cb9bb50e9600b5bd5e5ca4ba0f9187b118069a: name: ShaderBuildWorker path: Engine/Binaries/Win64/ShaderBuildWorker.exe host: Win64 buildsystem_version: '17fe280d-ccd8-4be8-a9d1-89c944a70969' timeout: 300 cores: 1 environment: [] executables: - name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataBuildWorker.dll' hash: f4dbec80e549bae2916288f1b9428c2878d9ae7a size: 166912 - name: 'Engine/Binaries/Win64/ShaderBuildWorker-DerivedDataCache.dll' hash: 8025d561ede05db19b235fc2ef290e2b029c1b8c size: 4339200 - name: Engine/Binaries/Win64/ShaderBuildWorker.exe hash: b85862fca2ce04990470f27bae9ead7f31d9b27e size: 60928 - name: Engine/Binaries/Win64/ShaderBuildWorker.modules hash: 7b05741a69a2ea607c5578668a8de50b04259668 size: 3739 - name: Engine/Binaries/Win64/ShaderBuildWorker.version hash: 8fdfd9f825febf2191b555393e69b32a1d78c24f size: 259 files: [] dirs: - Engine/Binaries/Win64 functions: - name: CompileShaderJobs version: '83027356-2cf7-41ca-aba5-c81ab0ff2129' ``` ## API (WIP not final) The compute interfaces are currently exposed on the `/apply` endpoint but this will be subject to change as we adapt the interfaces during development. The LSN APIs below are intended to replace the action ID oriented APIs. The POST APIs typically involve a two-step dance where a descriptor is POSTed and the service responds with a list of `needs` chunks (identified via `IoHash`) which it does not have yet. The client can then follow up with a POST of a Compact Binary Package containing the descriptor along with the needed chunks. `/apply/ready` - health check endpoint returns HTTP 200 OK or HTTP 503 `/apply/sysinfo` - system information endpoint `/apply/record/start`, `/apply/record/stop` - start/stop action recording `/apply/workers/{worker}` - GET/POST worker descriptors and payloads `/apply/jobs/completed` - GET list of completed actions `/apply/jobs/{lsn}` - GET completed action results from LSN, POST action cancellation by LSN, priority changes by LSN `/apply/jobs/{worker}/{action}` - GET completed action (job) results by action ID `/apply/jobs/{worker}` - GET pending/running jobs for worker, POST requests to schedule action as a job `/apply/jobs` - POST request to schedule action as a job