diff options
| author | Stefan Boberg <[email protected]> | 2025-11-07 14:49:13 +0100 |
|---|---|---|
| committer | GitHub Enterprise <[email protected]> | 2025-11-07 14:49:13 +0100 |
| commit | 24e43a913f29ac3b314354e8ce5175f135bcc64f (patch) | |
| tree | ca442937ceeb63461012b33a4576e9835099f106 /thirdparty/blake3/b3sum | |
| parent | get oplog attachments (#622) (diff) | |
| download | zen-24e43a913f29ac3b314354e8ce5175f135bcc64f.tar.xz zen-24e43a913f29ac3b314354e8ce5175f135bcc64f.zip | |
switch to xmake for package management (#611)
This change removes our dependency on vcpkg for package management, in favour of bringing some code in-tree in the `thirdparty` folder as well as using the xmake build-in package management feature. For the latter, all the package definitions are maintained in the zen repo itself, in the `repo` folder.
It should now also be easier to build the project as it will no longer depend on having the right version of vcpkg installed, which has been a common problem for new people coming in to the codebase. Now you should only need xmake to build.
* Bumps xmake requirement on github runners to 2.9.9 to resolve an issue where xmake on Windows invokes cmake with `v144` toolchain which does not exist
* BLAKE3 is now in-tree at `thirdparty/blake3`
* cpr is now in-tree at `thirdparty/cpr`
* cxxopts is now in-tree at `thirdparty/cxxopts`
* fmt is now in-tree at `thirdparty/fmt`
* robin-map is now in-tree at `thirdparty/robin-map`
* ryml is now in-tree at `thirdparty/ryml`
* sol2 is now in-tree at `thirdparty/sol2`
* spdlog is now in-tree at `thirdparty/spdlog`
* utfcpp is now in-tree at `thirdparty/utfcpp`
* xmake package repo definitions is in `repo`
* implemented support for sanitizers. ASAN is supported on windows, TSAN, UBSAN, MSAN etc are supported on Linux/MacOS though I have not yet tested it extensively on MacOS
* the zencore encryption implementation also now supports using mbedTLS which is used on MacOS, though for now we still use openssl on Linux
* crashpad
* bumps libcurl to 8.11.0 (from 8.8.0) which should address a rare build upload bug
Diffstat (limited to 'thirdparty/blake3/b3sum')
| -rw-r--r-- | thirdparty/blake3/b3sum/.gitignore | 1 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/Cargo.lock | 513 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/Cargo.toml | 26 | ||||
| l--------- | thirdparty/blake3/b3sum/LICENSE_A2 | 1 | ||||
| l--------- | thirdparty/blake3/b3sum/LICENSE_A2LLVM | 1 | ||||
| l--------- | thirdparty/blake3/b3sum/LICENSE_CC0 | 1 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/README.md | 72 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/src/main.rs | 564 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/src/unit_tests.rs | 235 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/tests/cli_tests.rs | 680 | ||||
| -rw-r--r-- | thirdparty/blake3/b3sum/what_does_check_do.md | 176 |
11 files changed, 2270 insertions, 0 deletions
diff --git a/thirdparty/blake3/b3sum/.gitignore b/thirdparty/blake3/b3sum/.gitignore new file mode 100644 index 000000000..9da4a887b --- /dev/null +++ b/thirdparty/blake3/b3sum/.gitignore @@ -0,0 +1 @@ +!Cargo.lock diff --git a/thirdparty/blake3/b3sum/Cargo.lock b/thirdparty/blake3/b3sum/Cargo.lock new file mode 100644 index 000000000..e09a52daf --- /dev/null +++ b/thirdparty/blake3/b3sum/Cargo.lock @@ -0,0 +1,513 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 4 + +[[package]] +name = "anstream" +version = "0.6.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8acc5369981196006228e28809f761875c0327210a891e941f4c683b3a99529b" +dependencies = [ + "anstyle", + "anstyle-parse", + "anstyle-query", + "anstyle-wincon", + "colorchoice", + "is_terminal_polyfill", + "utf8parse", +] + +[[package]] +name = "anstyle" +version = "1.0.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "55cc3b69f167a1ef2e161439aa98aed94e6028e5f9a59be9a6ffb47aef1651f9" + +[[package]] +name = "anstyle-parse" +version = "0.2.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3b2d16507662817a6a20a9ea92df6652ee4f94f914589377d69f3b21bc5798a9" +dependencies = [ + "utf8parse", +] + +[[package]] +name = "anstyle-query" +version = "1.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "79947af37f4177cfead1110013d678905c37501914fba0efea834c3fe9a8d60c" +dependencies = [ + "windows-sys", +] + +[[package]] +name = "anstyle-wincon" +version = "3.0.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ca3534e77181a9cc07539ad51f2141fe32f6c3ffd4df76db8ad92346b003ae4e" +dependencies = [ + "anstyle", + "once_cell", + "windows-sys", +] + +[[package]] +name = "anyhow" +version = "1.0.98" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e16d2d3311acee920a9eb8d33b8cbc1787ce4a264e85f964c2404b969bdcd487" + +[[package]] +name = "arrayref" +version = "0.3.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "76a2e8124351fda1ef8aaaa3bbd7ebbcb486bbcd4225aca0aa0d84bb2db8fecb" + +[[package]] +name = "arrayvec" +version = "0.7.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50" + +[[package]] +name = "b3sum" +version = "1.8.2" +dependencies = [ + "anyhow", + "blake3", + "clap", + "duct", + "hex", + "rayon-core", + "tempfile", + "wild", +] + +[[package]] +name = "bitflags" +version = "2.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5c8214115b7bf84099f1309324e63141d4c5d7cc26862f97a0a857dbefe165bd" + +[[package]] +name = "blake3" +version = "1.8.2" +dependencies = [ + "arrayref", + "arrayvec", + "cc", + "cfg-if", + "constant_time_eq", + "memmap2", + "rayon-core", +] + +[[package]] +name = "cc" +version = "1.2.19" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8e3a13707ac958681c13b39b458c073d0d9bc8a22cb1b2f4c8e55eb72c13f362" +dependencies = [ + "shlex", +] + +[[package]] +name = "cfg-if" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd" + +[[package]] +name = "clap" +version = "4.5.37" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "eccb054f56cbd38340b380d4a8e69ef1f02f1af43db2f0cc817a4774d80ae071" +dependencies = [ + "clap_builder", + "clap_derive", +] + +[[package]] +name = "clap_builder" +version = "4.5.37" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "efd9466fac8543255d3b1fcad4762c5e116ffe808c8a3043d4263cd4fd4862a2" +dependencies = [ + "anstream", + "anstyle", + "clap_lex", + "strsim", + "terminal_size", +] + +[[package]] +name = "clap_derive" +version = "4.5.32" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09176aae279615badda0765c0c0b3f6ed53f4709118af73cf4655d85d1530cd7" +dependencies = [ + "heck", + "proc-macro2", + "quote", + "syn", +] + +[[package]] +name = "clap_lex" +version = "0.7.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f46ad14479a25103f283c0f10005961cf086d8dc42205bb44c46ac563475dca6" + +[[package]] +name = "colorchoice" +version = "1.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5b63caa9aa9397e2d9480a9b13673856c78d8ac123288526c37d7839f2a86990" + +[[package]] +name = "constant_time_eq" +version = "0.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7c74b8349d32d297c9134b8c88677813a227df8f779daa29bfc29c183fe3dca6" + +[[package]] +name = "crossbeam-deque" +version = "0.8.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9dd111b7b7f7d55b72c0a6ae361660ee5853c9af73f70c3c2ef6858b950e2e51" +dependencies = [ + "crossbeam-epoch", + "crossbeam-utils", +] + +[[package]] +name = "crossbeam-epoch" +version = "0.9.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5b82ac4a3c2ca9c3460964f020e1402edd5753411d7737aa39c3714ad1b5420e" +dependencies = [ + "crossbeam-utils", +] + +[[package]] +name = "crossbeam-utils" +version = "0.8.21" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28" + +[[package]] +name = "duct" +version = "0.13.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e4ab5718d1224b63252cd0c6f74f6480f9ffeb117438a2e0f5cf6d9a4798929c" +dependencies = [ + "libc", + "once_cell", + "os_pipe", + "shared_child", +] + +[[package]] +name = "errno" +version = "0.3.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "976dd42dc7e85965fe702eb8164f21f450704bdde31faefd6471dba214cb594e" +dependencies = [ + "libc", + "windows-sys", +] + +[[package]] +name = "fastrand" +version = "2.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "37909eebbb50d72f9059c3b6d82c0463f2ff062c9e95845c43a6c9c0355411be" + +[[package]] +name = "getrandom" +version = "0.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "73fea8450eea4bac3940448fb7ae50d91f034f941199fcd9d909a5a07aa455f0" +dependencies = [ + "cfg-if", + "libc", + "r-efi", + "wasi", +] + +[[package]] +name = "glob" +version = "0.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a8d1add55171497b4705a648c6b583acafb01d58050a51727785f0b2c8e0a2b2" + +[[package]] +name = "heck" +version = "0.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2304e00983f87ffb38b55b444b5e3b60a884b5d30c0fca7d82fe33449bbe55ea" + +[[package]] +name = "hex" +version = "0.4.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7f24254aa9a54b5c858eaee2f5bccdb46aaf0e486a595ed5fd8f86ba55232a70" + +[[package]] +name = "is_terminal_polyfill" +version = "1.70.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7943c866cc5cd64cbc25b2e01621d07fa8eb2a1a23160ee81ce38704e97b8ecf" + +[[package]] +name = "libc" +version = "0.2.172" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d750af042f7ef4f724306de029d18836c26c1765a54a6a3f094cbd23a7267ffa" + +[[package]] +name = "linux-raw-sys" +version = "0.9.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cd945864f07fe9f5371a27ad7b52a172b4b499999f1d97574c9fa68373937e12" + +[[package]] +name = "memmap2" +version = "0.9.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fd3f7eed9d3848f8b98834af67102b720745c4ec028fcd0aa0239277e7de374f" +dependencies = [ + "libc", +] + +[[package]] +name = "once_cell" +version = "1.21.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "42f5e15c9953c5e4ccceeb2e7382a716482c34515315f7b03532b8b4e8393d2d" + +[[package]] +name = "os_pipe" +version = "1.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5ffd2b0a5634335b135d5728d84c5e0fd726954b87111f7506a61c502280d982" +dependencies = [ + "libc", + "windows-sys", +] + +[[package]] +name = "proc-macro2" +version = "1.0.95" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "02b3e5e68a3a1a02aad3ec490a98007cbc13c37cbe84a3cd7b8e406d76e7f778" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "quote" +version = "1.0.40" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1885c039570dc00dcb4ff087a89e185fd56bae234ddc7f056a945bf36467248d" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "r-efi" +version = "5.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "74765f6d916ee2faa39bc8e68e4f3ed8949b48cccdac59983d287a7cb71ce9c5" + +[[package]] +name = "rayon-core" +version = "1.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1465873a3dfdaa8ae7cb14b4383657caab0b3e8a0aa9ae8e04b044854c8dfce2" +dependencies = [ + "crossbeam-deque", + "crossbeam-utils", +] + +[[package]] +name = "rustix" +version = "1.0.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d97817398dd4bb2e6da002002db259209759911da105da92bec29ccb12cf58bf" +dependencies = [ + "bitflags", + "errno", + "libc", + "linux-raw-sys", + "windows-sys", +] + +[[package]] +name = "shared_child" +version = "1.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09fa9338aed9a1df411814a5b2252f7cd206c55ae9bf2fa763f8de84603aa60c" +dependencies = [ + "libc", + "windows-sys", +] + +[[package]] +name = "shlex" +version = "1.3.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64" + +[[package]] +name = "strsim" +version = "0.11.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7da8b5736845d9f2fcb837ea5d9e2628564b3b043a70948a3f0b778838c5fb4f" + +[[package]] +name = "syn" +version = "2.0.100" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b09a44accad81e1ba1cd74a32461ba89dee89095ba17b32f5d03683b1b1fc2a0" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "tempfile" +version = "3.19.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7437ac7763b9b123ccf33c338a5cc1bac6f69b45a136c19bdd8a65e3916435bf" +dependencies = [ + "fastrand", + "getrandom", + "once_cell", + "rustix", + "windows-sys", +] + +[[package]] +name = "terminal_size" +version = "0.4.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "45c6481c4829e4cc63825e62c49186a34538b7b2750b73b266581ffb612fb5ed" +dependencies = [ + "rustix", + "windows-sys", +] + +[[package]] +name = "unicode-ident" +version = "1.0.18" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5a5f39404a5da50712a4c1eecf25e90dd62b613502b7e925fd4e4d19b5c96512" + +[[package]] +name = "utf8parse" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "06abde3611657adf66d383f00b093d7faecc7fa57071cce2578660c9f1010821" + +[[package]] +name = "wasi" +version = "0.14.2+wasi-0.2.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9683f9a5a998d873c0d21fcbe3c083009670149a8fab228644b8bd36b2c48cb3" +dependencies = [ + "wit-bindgen-rt", +] + +[[package]] +name = "wild" +version = "2.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a3131afc8c575281e1e80f36ed6a092aa502c08b18ed7524e86fbbb12bb410e1" +dependencies = [ + "glob", +] + +[[package]] +name = "windows-sys" +version = "0.59.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1e38bc4d79ed67fd075bcc251a1c39b32a1776bbe92e5bef1f0bf1f8c531853b" +dependencies = [ + "windows-targets", +] + +[[package]] +name = "windows-targets" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9b724f72796e036ab90c1021d4780d4d3d648aca59e491e6b98e725b84e99973" +dependencies = [ + "windows_aarch64_gnullvm", + "windows_aarch64_msvc", + "windows_i686_gnu", + "windows_i686_gnullvm", + "windows_i686_msvc", + "windows_x86_64_gnu", + "windows_x86_64_gnullvm", + "windows_x86_64_msvc", +] + +[[package]] +name = "windows_aarch64_gnullvm" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "32a4622180e7a0ec044bb555404c800bc9fd9ec262ec147edd5989ccd0c02cd3" + +[[package]] +name = "windows_aarch64_msvc" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "09ec2a7bb152e2252b53fa7803150007879548bc709c039df7627cabbd05d469" + +[[package]] +name = "windows_i686_gnu" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8e9b5ad5ab802e97eb8e295ac6720e509ee4c243f69d781394014ebfe8bbfa0b" + +[[package]] +name = "windows_i686_gnullvm" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0eee52d38c090b3caa76c563b86c3a4bd71ef1a819287c19d586d7334ae8ed66" + +[[package]] +name = "windows_i686_msvc" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "240948bc05c5e7c6dabba28bf89d89ffce3e303022809e73deaefe4f6ec56c66" + +[[package]] +name = "windows_x86_64_gnu" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "147a5c80aabfbf0c7d901cb5895d1de30ef2907eb21fbbab29ca94c5b08b1a78" + +[[package]] +name = "windows_x86_64_gnullvm" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "24d5b23dc417412679681396f2b49f3de8c1473deb516bd34410872eff51ed0d" + +[[package]] +name = "windows_x86_64_msvc" +version = "0.52.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "589f6da84c646204747d1270a2a5661ea66ed1cced2631d546fdfb155959f9ec" + +[[package]] +name = "wit-bindgen-rt" +version = "0.39.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6f42320e61fe2cfd34354ecb597f86f413484a798ba44a8ca1165c58d42da6c1" +dependencies = [ + "bitflags", +] diff --git a/thirdparty/blake3/b3sum/Cargo.toml b/thirdparty/blake3/b3sum/Cargo.toml new file mode 100644 index 000000000..8ea930291 --- /dev/null +++ b/thirdparty/blake3/b3sum/Cargo.toml @@ -0,0 +1,26 @@ +[package] +name = "b3sum" +version = "1.8.2" +authors = ["Jack O'Connor <[email protected]>"] +description = "a command line implementation of the BLAKE3 hash function" +repository = "https://github.com/BLAKE3-team/BLAKE3" +license = "CC0-1.0 OR Apache-2.0 OR Apache-2.0 WITH LLVM-exception" +readme = "README.md" +edition = "2021" + +[features] +neon = ["blake3/neon"] +prefer_intrinsics = ["blake3/prefer_intrinsics"] +pure = ["blake3/pure"] + +[dependencies] +anyhow = "1.0.25" +blake3 = { version = "1.8", path = "..", features = ["mmap", "rayon"] } +clap = { version = "4.0.8", features = ["derive", "wrap_help"] } +hex = "0.4.0" +rayon-core = "1.12.1" +wild = "2.0.3" + +[dev-dependencies] +duct = "0.13.3" +tempfile = "3.1.0" diff --git a/thirdparty/blake3/b3sum/LICENSE_A2 b/thirdparty/blake3/b3sum/LICENSE_A2 new file mode 120000 index 000000000..c7b0be82d --- /dev/null +++ b/thirdparty/blake3/b3sum/LICENSE_A2 @@ -0,0 +1 @@ +../LICENSE_A2
\ No newline at end of file diff --git a/thirdparty/blake3/b3sum/LICENSE_A2LLVM b/thirdparty/blake3/b3sum/LICENSE_A2LLVM new file mode 120000 index 000000000..c9f3d16a6 --- /dev/null +++ b/thirdparty/blake3/b3sum/LICENSE_A2LLVM @@ -0,0 +1 @@ +../LICENSE_A2LLVM
\ No newline at end of file diff --git a/thirdparty/blake3/b3sum/LICENSE_CC0 b/thirdparty/blake3/b3sum/LICENSE_CC0 new file mode 120000 index 000000000..856562a79 --- /dev/null +++ b/thirdparty/blake3/b3sum/LICENSE_CC0 @@ -0,0 +1 @@ +../LICENSE_CC0
\ No newline at end of file diff --git a/thirdparty/blake3/b3sum/README.md b/thirdparty/blake3/b3sum/README.md new file mode 100644 index 000000000..595ce9e95 --- /dev/null +++ b/thirdparty/blake3/b3sum/README.md @@ -0,0 +1,72 @@ +# b3sum + +A command line utility for calculating +[BLAKE3](https://github.com/BLAKE3-team/BLAKE3) hashes, similar to +Coreutils tools like `b2sum` or `md5sum`. + +``` +Usage: b3sum [OPTIONS] [FILE]... + +Arguments: + [FILE]... Files to hash, or checkfiles to check + +Options: + --keyed Use the keyed mode, reading the 32-byte key from stdin + --derive-key <CONTEXT> Use the key derivation mode, with the given context string + -l, --length <LEN> The number of output bytes, before hex encoding [default: 32] + --seek <SEEK> The starting output byte offset, before hex encoding [default: 0] + --num-threads <NUM> The maximum number of threads to use + --no-mmap Disable memory mapping + --no-names Omit filenames in the output + --raw Write raw output bytes to stdout, rather than hex + --tag Output BSD-style checksums: BLAKE3 ([FILE]) = [HASH] + -c, --check Read BLAKE3 sums from the [FILE]s and check them + --quiet Skip printing OK for each checked file + -h, --help Print help (see more with '--help') + -V, --version Print version +``` + +See also [this document about how the `--check` flag +works](https://github.com/BLAKE3-team/BLAKE3/blob/master/b3sum/what_does_check_do.md). + +# Example + +Hash the file `foo.txt`: + +```bash +b3sum foo.txt +``` + +Time hashing a gigabyte of data, to see how fast it is: + +```bash +# Create a 1 GB file. +head -c 1000000000 /dev/zero > /tmp/bigfile +# Hash it with SHA-256. +time openssl sha256 /tmp/bigfile +# Hash it with BLAKE3. +time b3sum /tmp/bigfile +``` + + +# Installation + +Prebuilt binaries are available for Linux, Windows, and macOS (requiring +the [unidentified developer +workaround](https://support.apple.com/guide/mac-help/open-a-mac-app-from-an-unidentified-developer-mh40616/mac)) +on the [releases page](https://github.com/BLAKE3-team/BLAKE3/releases). +If you've [installed Rust and +Cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html), +you can also build `b3sum` yourself with: + +``` +cargo install b3sum +``` + +On Linux for example, Cargo will put the compiled binary in +`~/.cargo/bin`. You might want to add that directory to your `$PATH`, or +`rustup` might have done it for you when you installed Cargo. + +If you want to install directly from this directory, you can run `cargo +install --path .`. Or you can just build with `cargo build --release`, +which puts the binary at `./target/release/b3sum`. diff --git a/thirdparty/blake3/b3sum/src/main.rs b/thirdparty/blake3/b3sum/src/main.rs new file mode 100644 index 000000000..69a10c837 --- /dev/null +++ b/thirdparty/blake3/b3sum/src/main.rs @@ -0,0 +1,564 @@ +use anyhow::{bail, ensure}; +use clap::Parser; +use std::cmp; +use std::fs::File; +use std::io; +use std::io::prelude::*; +use std::path::{Path, PathBuf}; + +#[cfg(test)] +mod unit_tests; + +const NAME: &str = "b3sum"; + +const DERIVE_KEY_ARG: &str = "derive_key"; +const KEYED_ARG: &str = "keyed"; +const LENGTH_ARG: &str = "length"; +const NO_NAMES_ARG: &str = "no_names"; +const RAW_ARG: &str = "raw"; +const TAG_ARG: &str = "tag"; +const CHECK_ARG: &str = "check"; + +#[derive(Parser)] +#[command(version, max_term_width(100))] +struct Inner { + /// Files to hash, or checkfiles to check + /// + /// When no file is given, or when - is given, read standard input. + file: Vec<PathBuf>, + + /// Use the keyed mode, reading the 32-byte key from stdin + #[arg(long, requires("file"))] + keyed: bool, + + /// Use the key derivation mode, with the given context string + /// + /// Cannot be used with --keyed. + #[arg(long, value_name("CONTEXT"), conflicts_with(KEYED_ARG))] + derive_key: Option<String>, + + /// The number of output bytes, before hex encoding + #[arg( + short, + long, + default_value_t = blake3::OUT_LEN as u64, + value_name("LEN") + )] + length: u64, + + /// The starting output byte offset, before hex encoding + #[arg(long, default_value_t = 0, value_name("SEEK"))] + seek: u64, + + /// The maximum number of threads to use + /// + /// By default, this is the number of logical cores. If this flag is + /// omitted, or if its value is 0, RAYON_NUM_THREADS is also respected. + #[arg(long, value_name("NUM"))] + num_threads: Option<usize>, + + /// Disable memory mapping + /// + /// Currently this also disables multithreading. + #[arg(long)] + no_mmap: bool, + + /// Omit filenames in the output + #[arg(long)] + no_names: bool, + + /// Write raw output bytes to stdout, rather than hex + /// + /// --no-names is implied. In this case, only a single input is allowed. + #[arg(long)] + raw: bool, + + /// Output BSD-style checksums: BLAKE3 ([FILE]) = [HASH] + #[arg(long)] + tag: bool, + + /// Read BLAKE3 sums from the [FILE]s and check them + #[arg( + short, + long, + conflicts_with(DERIVE_KEY_ARG), + conflicts_with(KEYED_ARG), + conflicts_with(LENGTH_ARG), + conflicts_with(RAW_ARG), + conflicts_with(TAG_ARG), + conflicts_with(NO_NAMES_ARG) + )] + check: bool, + + /// Skip printing OK for each checked file + /// + /// Must be used with --check. + #[arg(long, requires(CHECK_ARG))] + quiet: bool, +} + +struct Args { + inner: Inner, + file_args: Vec<PathBuf>, + base_hasher: blake3::Hasher, +} + +impl Args { + fn parse() -> anyhow::Result<Self> { + // wild::args_os() is equivalent to std::env::args_os() on Unix, + // but on Windows it adds support for globbing. + let inner = Inner::parse_from(wild::args_os()); + let file_args = if !inner.file.is_empty() { + inner.file.clone() + } else { + vec!["-".into()] + }; + if inner.raw && file_args.len() > 1 { + bail!("Only one filename can be provided when using --raw"); + } + let base_hasher = if inner.keyed { + // In keyed mode, since stdin is used for the key, we can't handle + // `-` arguments. Input::open handles that case below. + blake3::Hasher::new_keyed(&read_key_from_stdin()?) + } else if let Some(ref context) = inner.derive_key { + blake3::Hasher::new_derive_key(context) + } else { + blake3::Hasher::new() + }; + Ok(Self { + inner, + file_args, + base_hasher, + }) + } + + fn num_threads(&self) -> Option<usize> { + self.inner.num_threads + } + + fn check(&self) -> bool { + self.inner.check + } + + fn raw(&self) -> bool { + self.inner.raw + } + + fn tag(&self) -> bool { + self.inner.tag + } + + fn no_mmap(&self) -> bool { + self.inner.no_mmap + } + + fn no_names(&self) -> bool { + self.inner.no_names + } + + fn len(&self) -> u64 { + self.inner.length + } + + fn seek(&self) -> u64 { + self.inner.seek + } + + fn keyed(&self) -> bool { + self.inner.keyed + } + + fn quiet(&self) -> bool { + self.inner.quiet + } +} + +fn hash_path(args: &Args, path: &Path) -> anyhow::Result<blake3::OutputReader> { + let mut hasher = args.base_hasher.clone(); + if path == Path::new("-") { + if args.keyed() { + bail!("Cannot open `-` in keyed mode"); + } + hasher.update_reader(io::stdin().lock())?; + } else if args.no_mmap() { + hasher.update_reader(File::open(path)?)?; + } else { + // The fast path: Try to mmap the file and hash it with multiple threads. + hasher.update_mmap_rayon(path)?; + } + let mut output_reader = hasher.finalize_xof(); + output_reader.set_position(args.seek()); + Ok(output_reader) +} + +fn write_hex_output(mut output: blake3::OutputReader, args: &Args) -> anyhow::Result<()> { + // Encoding multiples of the 64 bytes is most efficient. + // TODO: This computes each output block twice when the --seek argument isn't a multiple of 64. + // We'll refactor all of this soon anyway, once SIMD optimizations are available for the XOF. + let mut len = args.len(); + let mut block = [0; blake3::BLOCK_LEN]; + while len > 0 { + output.fill(&mut block); + let hex_str = hex::encode(&block[..]); + let take_bytes = cmp::min(len, block.len() as u64); + print!("{}", &hex_str[..2 * take_bytes as usize]); + len -= take_bytes; + } + Ok(()) +} + +fn write_raw_output(output: blake3::OutputReader, args: &Args) -> anyhow::Result<()> { + let mut output = output.take(args.len()); + let stdout = std::io::stdout(); + let mut handler = stdout.lock(); + std::io::copy(&mut output, &mut handler)?; + + Ok(()) +} + +fn read_key_from_stdin() -> anyhow::Result<[u8; blake3::KEY_LEN]> { + let mut bytes = Vec::with_capacity(blake3::KEY_LEN + 1); + let n = std::io::stdin() + .lock() + .take(blake3::KEY_LEN as u64 + 1) + .read_to_end(&mut bytes)?; + if n < blake3::KEY_LEN { + bail!( + "expected {} key bytes from stdin, found {}", + blake3::KEY_LEN, + n, + ) + } else if n > blake3::KEY_LEN { + bail!("read more than {} key bytes from stdin", blake3::KEY_LEN) + } else { + Ok(bytes[..blake3::KEY_LEN].try_into().unwrap()) + } +} + +struct FilepathString { + filepath_string: String, + is_escaped: bool, +} + +// returns (string, did_escape) +fn filepath_to_string(filepath: &Path) -> FilepathString { + let unicode_cow = filepath.to_string_lossy(); + let mut filepath_string = unicode_cow.to_string(); + // If we're on Windows, normalize backslashes to forward slashes. This + // avoids a lot of ugly escaping in the common case, and it makes + // checkfiles created on Windows more likely to be portable to Unix. It + // also allows us to set a blanket "no backslashes allowed in checkfiles on + // Windows" rule, rather than allowing a Unix backslash to potentially get + // interpreted as a directory separator on Windows. + if cfg!(windows) { + filepath_string = filepath_string.replace('\\', "/"); + } + let mut is_escaped = false; + if filepath_string.contains(['\\', '\n', '\r']) { + filepath_string = filepath_string + .replace('\\', "\\\\") + .replace('\n', "\\n") + .replace('\r', "\\r"); + is_escaped = true; + } + FilepathString { + filepath_string, + is_escaped, + } +} + +fn hex_half_byte(c: char) -> anyhow::Result<u8> { + // The hex characters in the hash must be lowercase for now, though we + // could support uppercase too if we wanted to. + if '0' <= c && c <= '9' { + return Ok(c as u8 - '0' as u8); + } + if 'a' <= c && c <= 'f' { + return Ok(c as u8 - 'a' as u8 + 10); + } + bail!("Invalid hex"); +} + +// The `check` command is a security tool. That means it's much better for a +// check to fail more often than it should (a false negative), than for a check +// to ever succeed when it shouldn't (a false positive). By forbidding certain +// characters in checked filepaths, we avoid a class of false positives where +// two different filepaths can get confused with each other. +fn check_for_invalid_characters(utf8_path: &str) -> anyhow::Result<()> { + // Null characters in paths should never happen, but they can result in a + // path getting silently truncated on Unix. + if utf8_path.contains('\0') { + bail!("Null character in path"); + } + // Because we convert invalid UTF-8 sequences in paths to the Unicode + // replacement character, multiple different invalid paths can map to the + // same UTF-8 string. + if utf8_path.contains('�') { + bail!("Unicode replacement character in path"); + } + // We normalize all Windows backslashes to forward slashes in our output, + // so the only natural way to get a backslash in a checkfile on Windows is + // to construct it on Unix and copy it over. (Or of course you could just + // doctor it by hand.) To avoid confusing this with a directory separator, + // we forbid backslashes entirely on Windows. Note that this check comes + // after unescaping has been done. + if cfg!(windows) && utf8_path.contains('\\') { + bail!("Backslash in path"); + } + Ok(()) +} + +fn unescape(mut path: &str) -> anyhow::Result<String> { + let mut unescaped = String::with_capacity(2 * path.len()); + while let Some(i) = path.find('\\') { + ensure!(i < path.len() - 1, "Invalid backslash escape"); + unescaped.push_str(&path[..i]); + match path[i + 1..].chars().next().unwrap() { + // Anything other than a recognized escape sequence is an error. + 'n' => unescaped.push_str("\n"), + 'r' => unescaped.push_str("\r"), + '\\' => unescaped.push_str("\\"), + _ => bail!("Invalid backslash escape"), + } + path = &path[i + 2..]; + } + unescaped.push_str(path); + Ok(unescaped) +} + +#[derive(Debug)] +struct ParsedCheckLine { + file_string: String, + is_escaped: bool, + file_path: PathBuf, + expected_hash: blake3::Hash, +} + +fn split_untagged_check_line(line_after_slash: &str) -> Option<(&str, &str)> { + // Of the form "<hash> <file>". The file might contain " ", so we need to split from the + // left. + line_after_slash.split_once(" ") +} + +fn split_tagged_check_line(line_after_slash: &str) -> Option<(&str, &str)> { + // Of the form "BLAKE3 (<file>) = <hash>". The file might contain ") = ", so we need to split + // from the *right*. + let prefix = "BLAKE3 ("; + if !line_after_slash.starts_with(prefix) { + return None; + } + line_after_slash[prefix.len()..].rsplit_once(") = ") +} + +fn parse_check_line(mut line: &str) -> anyhow::Result<ParsedCheckLine> { + // Trim off the trailing newlines, if any. + line = line.trim_end_matches(['\r', '\n']); + // If there's a backslash at the front of the line, that means we need to + // unescape the path below. This matches the behavior of e.g. md5sum. + let Some(first) = line.chars().next() else { + bail!("Empty line"); + }; + let line_after_slash; + let is_escaped; + if first == '\\' { + is_escaped = true; + line_after_slash = &line[1..]; + } else { + is_escaped = false; + line_after_slash = line; + } + + // Split the line. It might be "<hash> <file>" or "BLAKE3 (<file>) = <hash>". The latter comes + // from the --tag flag. + let hash_hex; + let file_str; + if let Some((left, right)) = split_untagged_check_line(line_after_slash) { + hash_hex = left; + file_str = right; + } else if let Some((left, right)) = split_tagged_check_line(line_after_slash) { + file_str = left; + hash_hex = right; + } else { + bail!("Invalid check line format"); + } + + // Decode the hex hash. + ensure!(hash_hex.len() == 2 * blake3::OUT_LEN, "Invalid hash length"); + let mut hex_chars = hash_hex.chars(); + let mut hash_bytes = [0; blake3::OUT_LEN]; + for byte in &mut hash_bytes { + let high_char = hex_chars.next().unwrap(); + let low_char = hex_chars.next().unwrap(); + *byte = 16 * hex_half_byte(high_char)? + hex_half_byte(low_char)?; + } + let expected_hash: blake3::Hash = hash_bytes.into(); + + // Unescape and validate the filepath. + let file_path_string = if is_escaped { + unescape(file_str)? + } else { + file_str.to_string() + }; + ensure!(!file_path_string.is_empty(), "empty file path"); + check_for_invalid_characters(&file_path_string)?; + + Ok(ParsedCheckLine { + file_string: file_str.to_string(), + is_escaped, + file_path: file_path_string.into(), + expected_hash, + }) +} + +fn hash_one_input(path: &Path, args: &Args) -> anyhow::Result<()> { + let output = hash_path(args, path)?; + if args.raw() { + write_raw_output(output, args)?; + return Ok(()); + } + if args.no_names() { + write_hex_output(output, args)?; + println!(); + return Ok(()); + } + let FilepathString { + filepath_string, + is_escaped, + } = filepath_to_string(path); + if is_escaped { + print!("\\"); + } + if args.tag() { + print!("BLAKE3 ({}) = ", filepath_string); + write_hex_output(output, args)?; + println!(); + return Ok(()); + } + write_hex_output(output, args)?; + println!(" {}", filepath_string); + Ok(()) +} + +// Returns true for success. Having a boolean return value here, instead of +// passing down the files_failed reference, makes it less likely that we might +// forget to set it in some error condition. +fn check_one_line(line: &str, args: &Args) -> bool { + let parse_result = parse_check_line(&line); + let ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = match parse_result { + Ok(parsed) => parsed, + Err(e) => { + eprintln!("{}: {}", NAME, e); + return false; + } + }; + let file_string = if is_escaped { + "\\".to_string() + &file_string + } else { + file_string + }; + let found_hash: blake3::Hash; + match hash_path(args, &file_path) { + Ok(mut output) => { + let mut found_hash_bytes = [0; blake3::OUT_LEN]; + output.fill(&mut found_hash_bytes); + found_hash = found_hash_bytes.into(); + } + Err(e) => { + println!("{}: FAILED ({})", file_string, e); + return false; + } + }; + // This is a constant-time comparison. + if expected_hash == found_hash { + if !args.quiet() { + println!("{}: OK", file_string); + } + true + } else { + println!("{}: FAILED", file_string); + false + } +} + +fn check_one_checkfile(path: &Path, args: &Args, files_failed: &mut u64) -> anyhow::Result<()> { + let mut file; + let stdin; + let mut stdin_lock; + let mut bufreader: io::BufReader<&mut dyn Read>; + if path == Path::new("-") { + stdin = io::stdin(); + stdin_lock = stdin.lock(); + bufreader = io::BufReader::new(&mut stdin_lock); + } else { + file = File::open(path)?; + bufreader = io::BufReader::new(&mut file); + } + let mut line = String::new(); + loop { + line.clear(); + let n = bufreader.read_line(&mut line)?; + if n == 0 { + return Ok(()); + } + // check_one_line() prints errors and turns them into a success=false + // return, so it doesn't return a Result. + let success = check_one_line(&line, args); + if !success { + // We use `files_failed > 0` to indicate a mismatch, so it's important for correctness + // that it's impossible for this counter to overflow. + *files_failed = files_failed.saturating_add(1); + } + } +} + +fn main() -> anyhow::Result<()> { + let args = Args::parse()?; + let mut thread_pool_builder = rayon_core::ThreadPoolBuilder::new(); + if let Some(num_threads) = args.num_threads() { + thread_pool_builder = thread_pool_builder.num_threads(num_threads); + } + let thread_pool = thread_pool_builder.build()?; + thread_pool.install(|| { + let mut files_failed = 0u64; + // Note that file_args automatically includes `-` if nothing is given. + for path in &args.file_args { + if args.check() { + check_one_checkfile(path, &args, &mut files_failed)?; + } else { + // Errors encountered in hashing are tolerated and printed to + // stderr. This allows e.g. `b3sum *` to print errors for + // non-files and keep going. However, if we encounter any + // errors we'll still return non-zero at the end. + let result = hash_one_input(path, &args); + if let Err(e) = result { + files_failed = files_failed.saturating_add(1); + eprintln!("{}: {}: {}", NAME, path.to_string_lossy(), e); + } + } + } + if args.check() && files_failed > 0 { + eprintln!( + "{}: WARNING: {} computed checksum{} did NOT match", + NAME, + files_failed, + if files_failed == 1 { "" } else { "s" }, + ); + } + std::process::exit(if files_failed > 0 { 1 } else { 0 }); + }) +} + +#[cfg(test)] +mod test { + use clap::CommandFactory; + + #[test] + fn test_args() { + crate::Inner::command().debug_assert(); + } +} diff --git a/thirdparty/blake3/b3sum/src/unit_tests.rs b/thirdparty/blake3/b3sum/src/unit_tests.rs new file mode 100644 index 000000000..75f672b4c --- /dev/null +++ b/thirdparty/blake3/b3sum/src/unit_tests.rs @@ -0,0 +1,235 @@ +use std::path::Path; + +#[test] +fn test_parse_check_line() { + // ========================= + // ===== Success Cases ===== + // ========================= + + // the basic case + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "0909090909090909090909090909090909090909090909090909090909090909 foo", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x09; 32])); + assert!(!is_escaped); + assert_eq!(file_string, "foo"); + assert_eq!(file_path, Path::new("foo")); + + // regular whitespace + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "fafafafafafafafafafafafafafafafafafafafafafafafafafafafafafafafa \t\r\n\n\r \t\r\n\n\r", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0xfa; 32])); + assert!(!is_escaped); + assert_eq!(file_string, " \t\r\n\n\r \t"); + assert_eq!(file_path, Path::new(" \t\r\n\n\r \t")); + + // path is one space + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "4242424242424242424242424242424242424242424242424242424242424242 ", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x42; 32])); + assert!(!is_escaped); + assert_eq!(file_string, " "); + assert_eq!(file_path, Path::new(" ")); + + // *Unescaped* backslashes. Note that this line does *not* start with a + // backslash, so something like "\" + "n" is interpreted as *two* + // characters. We forbid all backslashes on Windows, so this test is + // Unix-only. + if cfg!(not(windows)) { + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "4343434343434343434343434343434343434343434343434343434343434343 fo\\a\\no", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x43; 32])); + assert!(!is_escaped); + assert_eq!(file_string, "fo\\a\\no"); + assert_eq!(file_path, Path::new("fo\\a\\no")); + } + + // escaped newlines + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "\\4444444444444444444444444444444444444444444444444444444444444444 fo\\r\\n\\n\\ro", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x44; 32])); + assert!(is_escaped); + assert_eq!(file_string, "fo\\r\\n\\n\\ro"); + assert_eq!(file_path, Path::new("fo\r\n\n\ro")); + + // Escaped newline and backslash. Again because backslash is not allowed on + // Windows, this test is Unix-only. + if cfg!(not(windows)) { + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "\\4545454545454545454545454545454545454545454545454545454545454545 fo\\n\\\\o", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x45; 32])); + assert!(is_escaped); + assert_eq!(file_string, "fo\\n\\\\o"); + assert_eq!(file_path, Path::new("fo\n\\o")); + } + + // non-ASCII path + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "4646464646464646464646464646464646464646464646464646464646464646 否认", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x46; 32])); + assert!(!is_escaped); + assert_eq!(file_string, "否认"); + assert_eq!(file_path, Path::new("否认")); + + // untagged separator " " in the file name + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "4747474747474747474747474747474747474747474747474747474747474747 foo bar", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x47; 32])); + assert!(!is_escaped); + assert_eq!(file_string, "foo bar"); + assert_eq!(file_path, Path::new("foo bar")); + + // tagged separator ") = " in the file name + let crate::ParsedCheckLine { + file_string, + is_escaped, + file_path, + expected_hash, + } = crate::parse_check_line( + "BLAKE3 (foo) = bar) = 4848484848484848484848484848484848484848484848484848484848484848", + ) + .unwrap(); + assert_eq!(expected_hash, blake3::Hash::from([0x48; 32])); + assert!(!is_escaped); + assert_eq!(file_string, "foo) = bar"); + assert_eq!(file_path, Path::new("foo) = bar")); + + // ========================= + // ===== Failure Cases ===== + // ========================= + + // too short + crate::parse_check_line("").unwrap_err(); + crate::parse_check_line("0").unwrap_err(); + crate::parse_check_line("00").unwrap_err(); + crate::parse_check_line("0000000000000000000000000000000000000000000000000000000000000000") + .unwrap_err(); + crate::parse_check_line("0000000000000000000000000000000000000000000000000000000000000000 ") + .unwrap_err(); + + // not enough spaces + crate::parse_check_line("0000000000000000000000000000000000000000000000000000000000000000 foo") + .unwrap_err(); + + // capital letter hex + crate::parse_check_line( + "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA foo", + ) + .unwrap_err(); + + // non-hex hex + crate::parse_check_line( + "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx foo", + ) + .unwrap_err(); + + // non-ASCII hex + crate::parse_check_line("你好, 我叫杰克. 认识你很高兴. 要不要吃个香蕉? foo").unwrap_err(); + + // invalid escape sequence + crate::parse_check_line( + "\\0000000000000000000000000000000000000000000000000000000000000000 fo\\o", + ) + .unwrap_err(); + + // truncated escape sequence + crate::parse_check_line( + "\\0000000000000000000000000000000000000000000000000000000000000000 foo\\", + ) + .unwrap_err(); + + // null char + crate::parse_check_line( + "0000000000000000000000000000000000000000000000000000000000000000 fo\0o", + ) + .unwrap_err(); + + // Unicode replacement char + crate::parse_check_line( + "0000000000000000000000000000000000000000000000000000000000000000 fo�o", + ) + .unwrap_err(); + + // On Windows only, backslashes are not allowed, escaped or otherwise. + if cfg!(windows) { + crate::parse_check_line( + "0000000000000000000000000000000000000000000000000000000000000000 fo\\o", + ) + .unwrap_err(); + crate::parse_check_line( + "\\0000000000000000000000000000000000000000000000000000000000000000 fo\\\\o", + ) + .unwrap_err(); + } +} + +#[test] +fn test_filepath_to_string() { + let output = crate::filepath_to_string(Path::new("foo")); + assert_eq!(output.filepath_string, "foo"); + assert!(!output.is_escaped); + + let output = crate::filepath_to_string(Path::new("f\\ \t\r\noo")); + if cfg!(windows) { + // We normalize backslashes to forward slashes on Windows. + assert_eq!(output.filepath_string, "f/ \t\\r\\noo"); + } else { + assert_eq!(output.filepath_string, "f\\\\ \t\\r\\noo"); + } + assert!(output.is_escaped); +} diff --git a/thirdparty/blake3/b3sum/tests/cli_tests.rs b/thirdparty/blake3/b3sum/tests/cli_tests.rs new file mode 100644 index 000000000..3fe173d6b --- /dev/null +++ b/thirdparty/blake3/b3sum/tests/cli_tests.rs @@ -0,0 +1,680 @@ +use duct::cmd; +use std::ffi::OsString; +use std::fs; +use std::io::prelude::*; +use std::path::PathBuf; + +pub fn b3sum_exe() -> PathBuf { + env!("CARGO_BIN_EXE_b3sum").into() +} + +#[test] +fn test_hash_one() { + let expected = format!("{} -", blake3::hash(b"foo").to_hex()); + let output = cmd!(b3sum_exe()).stdin_bytes("foo").read().unwrap(); + assert_eq!(&*expected, output); +} + +#[test] +fn test_hash_one_tag() { + let expected = format!("BLAKE3 (-) = {}", blake3::hash(b"foo").to_hex()); + let output = cmd!(b3sum_exe(), "--tag") + .stdin_bytes("foo") + .read() + .unwrap(); + assert_eq!(&*expected, output); +} + +#[test] +fn test_hash_one_raw() { + let expected = blake3::hash(b"foo").as_bytes().to_owned(); + let output = cmd!(b3sum_exe(), "--raw") + .stdin_bytes("foo") + .stdout_capture() + .run() + .unwrap() + .stdout; + assert_eq!(expected, output.as_slice()); +} + +#[test] +fn test_hash_many() { + let dir = tempfile::tempdir().unwrap(); + let file1 = dir.path().join("file1"); + fs::write(&file1, b"foo").unwrap(); + let file2 = dir.path().join("file2"); + fs::write(&file2, b"bar").unwrap(); + + let output = cmd!(b3sum_exe(), &file1, &file2).read().unwrap(); + let foo_hash = blake3::hash(b"foo"); + let bar_hash = blake3::hash(b"bar"); + let expected = format!( + "{} {}\n{} {}", + foo_hash.to_hex(), + // account for slash normalization on Windows + file1.to_string_lossy().replace("\\", "/"), + bar_hash.to_hex(), + file2.to_string_lossy().replace("\\", "/"), + ); + assert_eq!(expected, output); + + let output_no_names = cmd!(b3sum_exe(), "--no-names", &file1, &file2) + .read() + .unwrap(); + let expected_no_names = format!("{}\n{}", foo_hash.to_hex(), bar_hash.to_hex(),); + assert_eq!(expected_no_names, output_no_names); +} + +#[test] +fn test_hash_many_tag() { + let dir = tempfile::tempdir().unwrap(); + let file1 = dir.path().join("file1"); + fs::write(&file1, b"foo").unwrap(); + let file2 = dir.path().join("file2"); + fs::write(&file2, b"bar").unwrap(); + + let output = cmd!(b3sum_exe(), "--tag", &file1, &file2).read().unwrap(); + let foo_hash = blake3::hash(b"foo"); + let bar_hash = blake3::hash(b"bar"); + let expected = format!( + "BLAKE3 ({}) = {}\nBLAKE3 ({}) = {}", + // account for slash normalization on Windows + file1.to_string_lossy().replace("\\", "/"), + foo_hash.to_hex(), + file2.to_string_lossy().replace("\\", "/"), + bar_hash.to_hex(), + ); + assert_eq!(expected, output); +} + +#[test] +fn test_missing_files() { + let dir = tempfile::tempdir().unwrap(); + let file1 = dir.path().join("file1"); + fs::write(&file1, b"foo").unwrap(); + let file2 = dir.path().join("file2"); + fs::write(&file2, b"bar").unwrap(); + + let output = cmd!(b3sum_exe(), "file1", "missing_file", "file2") + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + assert!(!output.status.success()); + + let foo_hash = blake3::hash(b"foo"); + let bar_hash = blake3::hash(b"bar"); + let expected_stdout = format!( + "{} file1\n{} file2\n", + foo_hash.to_hex(), + bar_hash.to_hex(), + ); + assert_eq!(expected_stdout.as_bytes(), &output.stdout[..]); + + let bing_error = fs::File::open(dir.path().join("missing_file")).unwrap_err(); + let expected_stderr = format!("b3sum: missing_file: {}\n", bing_error.to_string()); + assert_eq!(expected_stderr.as_bytes(), &output.stderr[..]); +} + +#[test] +fn test_hash_length_and_seek() { + let mut expected = [0; 100]; + blake3::Hasher::new() + .update(b"foo") + .finalize_xof() + .fill(&mut expected); + let output = cmd!(b3sum_exe(), "--raw", "--length=100") + .stdin_bytes("foo") + .stdout_capture() + .run() + .unwrap() + .stdout; + assert_eq!(expected[..], output); + + let short_output = cmd!(b3sum_exe(), "--raw", "--length=99") + .stdin_bytes("foo") + .stdout_capture() + .run() + .unwrap() + .stdout; + assert_eq!(expected[..99], short_output); + + let seek1_output = cmd!(b3sum_exe(), "--raw", "--length=99", "--seek=1") + .stdin_bytes("foo") + .stdout_capture() + .run() + .unwrap() + .stdout; + assert_eq!(expected[1..], seek1_output); + + let seek99_output = cmd!(b3sum_exe(), "--raw", "--length=1", "--seek=99") + .stdin_bytes("foo") + .stdout_capture() + .run() + .unwrap() + .stdout; + assert_eq!(expected[99..], seek99_output); +} + +#[test] +fn test_keyed() { + let key = [42; blake3::KEY_LEN]; + let f = tempfile::NamedTempFile::new().unwrap(); + f.as_file().write_all(b"foo").unwrap(); + f.as_file().flush().unwrap(); + let expected = blake3::keyed_hash(&key, b"foo").to_hex(); + let output = cmd!(b3sum_exe(), "--keyed", "--no-names", f.path()) + .stdin_bytes(&key[..]) + .read() + .unwrap(); + assert_eq!(&*expected, &*output); + + // Make sure that keys of the wrong length lead to errors. + for bad_length in [0, 1, blake3::KEY_LEN - 1, blake3::KEY_LEN + 1] { + dbg!(bad_length); + let output = cmd!(b3sum_exe(), "--keyed", f.path()) + .stdin_bytes(vec![0; bad_length]) + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + assert!(!output.status.success()); + assert!(output.stdout.is_empty()); + // Make sure the error message is relevant. + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + assert!(stderr.contains("key bytes")); + } +} + +#[test] +fn test_derive_key() { + let context = "BLAKE3 2019-12-28 10:28:41 example context"; + let f = tempfile::NamedTempFile::new().unwrap(); + f.as_file().write_all(b"key material").unwrap(); + f.as_file().flush().unwrap(); + let expected = hex::encode(blake3::derive_key(context, b"key material")); + let output = cmd!(b3sum_exe(), "--derive-key", context, "--no-names", f.path()) + .read() + .unwrap(); + assert_eq!(&*expected, &*output); +} + +#[test] +fn test_no_mmap() { + let f = tempfile::NamedTempFile::new().unwrap(); + f.as_file().write_all(b"foo").unwrap(); + f.as_file().flush().unwrap(); + + let expected = blake3::hash(b"foo").to_hex(); + let output = cmd!(b3sum_exe(), "--no-mmap", "--no-names", f.path()) + .read() + .unwrap(); + assert_eq!(&*expected, &*output); +} + +#[test] +fn test_length_without_value_is_an_error() { + let result = cmd!(b3sum_exe(), "--length") + .stdin_bytes("foo") + .stderr_capture() + .run(); + assert!(result.is_err()); +} + +#[test] +fn test_raw_with_multi_files_is_an_error() { + let f1 = tempfile::NamedTempFile::new().unwrap(); + let f2 = tempfile::NamedTempFile::new().unwrap(); + + // Make sure it doesn't error with just one file + let result = cmd!(b3sum_exe(), "--raw", f1.path()).stdout_capture().run(); + assert!(result.is_ok()); + + // Make sure it errors when both file are passed + let result = cmd!(b3sum_exe(), "--raw", f1.path(), f2.path()) + .stderr_capture() + .run(); + assert!(result.is_err()); +} + +#[test] +#[cfg(unix)] +fn test_newline_and_backslash_escaping_on_unix() { + let empty_hash = blake3::hash(b"").to_hex(); + let dir = tempfile::tempdir().unwrap(); + fs::create_dir(dir.path().join("subdir")).unwrap(); + let names = [ + "abcdef", + "abc\ndef", + "abc\\def", + "abc\rdef", + "abc\r\ndef", + "subdir/foo", + ]; + let mut paths = Vec::new(); + for name in &names { + let path = dir.path().join(name); + println!("creating file at {:?}", path); + fs::write(&path, b"").unwrap(); + paths.push(path); + } + let output = cmd(b3sum_exe(), &names).dir(dir.path()).read().unwrap(); + let expected = format!( + "\ +{0} abcdef +\\{0} abc\\ndef +\\{0} abc\\\\def +\\{0} abc\\rdef +\\{0} abc\\r\\ndef +{0} subdir/foo", + empty_hash, + ); + println!("output"); + println!("======"); + println!("{}", output); + println!(); + println!("expected"); + println!("========"); + println!("{}", expected); + println!(); + assert_eq!(expected, output); +} + +#[test] +#[cfg(windows)] +fn test_slash_normalization_on_windows() { + let empty_hash = blake3::hash(b"").to_hex(); + let dir = tempfile::tempdir().unwrap(); + fs::create_dir(dir.path().join("subdir")).unwrap(); + // Note that filenames can't contain newlines or backslashes on Windows, so + // we don't test escaping here. We only test forward slash and backslash as + // directory separators. + let names = ["abcdef", "subdir/foo", "subdir\\bar"]; + let mut paths = Vec::new(); + for name in &names { + let path = dir.path().join(name); + println!("creating file at {:?}", path); + fs::write(&path, b"").unwrap(); + paths.push(path); + } + let output = cmd(b3sum_exe(), &names).dir(dir.path()).read().unwrap(); + let expected = format!( + "\ +{0} abcdef +{0} subdir/foo +{0} subdir/bar", + empty_hash, + ); + println!("output"); + println!("======"); + println!("{}", output); + println!(); + println!("expected"); + println!("========"); + println!("{}", expected); + println!(); + assert_eq!(expected, output); +} + +#[test] +#[cfg(unix)] +fn test_invalid_unicode_on_unix() { + use std::os::unix::ffi::OsStringExt; + + let empty_hash = blake3::hash(b"").to_hex(); + let dir = tempfile::tempdir().unwrap(); + let names = ["abcdef".into(), OsString::from_vec(b"abc\xffdef".to_vec())]; + let mut paths = Vec::new(); + for name in &names { + let path = dir.path().join(name); + println!("creating file at {:?}", path); + // Note: Some operating systems, macOS in particular, simply don't + // allow invalid Unicode in filenames. On those systems, this write + // will fail. That's fine, we'll just short-circuit this test in that + // case. But assert that at least Linux allows this. + let write_result = fs::write(&path, b""); + if cfg!(target_os = "linux") { + write_result.expect("Linux should allow invalid Unicode"); + } else if write_result.is_err() { + return; + } + paths.push(path); + } + let output = cmd(b3sum_exe(), &names).dir(dir.path()).read().unwrap(); + let expected = format!( + "\ +{0} abcdef +{0} abc�def", + empty_hash, + ); + println!("output"); + println!("======"); + println!("{}", output); + println!(); + println!("expected"); + println!("========"); + println!("{}", expected); + println!(); + assert_eq!(expected, output); +} + +#[test] +#[cfg(windows)] +fn test_invalid_unicode_on_windows() { + use std::os::windows::ffi::OsStringExt; + + let empty_hash = blake3::hash(b"").to_hex(); + let dir = tempfile::tempdir().unwrap(); + let surrogate_char = 0xDC00; + let bad_unicode_wchars = [ + 'a' as u16, + 'b' as u16, + 'c' as u16, + surrogate_char, + 'd' as u16, + 'e' as u16, + 'f' as u16, + ]; + let bad_osstring = OsString::from_wide(&bad_unicode_wchars); + let names = ["abcdef".into(), bad_osstring]; + let mut paths = Vec::new(); + for name in &names { + let path = dir.path().join(name); + println!("creating file at {:?}", path); + fs::write(&path, b"").unwrap(); + paths.push(path); + } + let output = cmd(b3sum_exe(), &names).dir(dir.path()).read().unwrap(); + let expected = format!( + "\ +{0} abcdef +{0} abc�def", + empty_hash, + ); + println!("output"); + println!("======"); + println!("{}", output); + println!(); + println!("expected"); + println!("========"); + println!("{}", expected); + println!(); + assert_eq!(expected, output); +} + +#[test] +fn test_check() { + // Make a directory full of files, and make sure the b3sum output in that + // directory is what we expect. + let a_hash = blake3::hash(b"a").to_hex(); + let b_hash = blake3::hash(b"b").to_hex(); + let cd_hash = blake3::hash(b"cd").to_hex(); + for tagged in [false, true] { + let dir = tempfile::tempdir().unwrap(); + fs::write(dir.path().join("a"), b"a").unwrap(); + fs::write(dir.path().join("b"), b"b").unwrap(); + fs::create_dir(dir.path().join("c")).unwrap(); + fs::write(dir.path().join("c/d"), b"cd").unwrap(); + dbg!(tagged); + let mut args = vec!["a", "b", "c/d"]; + if tagged { + args.push("--tag"); + } + let output = cmd(b3sum_exe(), args) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_checkfile = if tagged { + format!( + "BLAKE3 (a) = {}\n\ + BLAKE3 (b) = {}\n\ + BLAKE3 (c/d) = {}\n", + a_hash, b_hash, cd_hash, + ) + } else { + format!( + "{} a\n\ + {} b\n\ + {} c/d\n", + a_hash, b_hash, cd_hash, + ) + }; + dbg!(&expected_checkfile); + assert_eq!(expected_checkfile, stdout); + assert_eq!("", stderr); + + // Now use the output we just validated as a checkfile, passed to stdin. + let output = cmd!(b3sum_exe(), "--check") + .stdin_bytes(expected_checkfile.as_bytes()) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_check_output = "\ + a: OK\n\ + b: OK\n\ + c/d: OK\n"; + assert_eq!(expected_check_output, stdout); + assert_eq!("", stderr); + + // Check the same file, but with Windows-style newlines. + let windows_style = expected_checkfile.replace("\n", "\r\n"); + let output = cmd!(b3sum_exe(), "--check") + .stdin_bytes(windows_style.as_bytes()) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_check_output = "\ + a: OK\n\ + b: OK\n\ + c/d: OK\n"; + assert_eq!(expected_check_output, stdout); + assert_eq!("", stderr); + + // Now pass the same checkfile twice on the command line just for fun. + let checkfile_path = dir.path().join("checkfile"); + fs::write(&checkfile_path, &expected_checkfile).unwrap(); + let output = cmd!(b3sum_exe(), "--check", &checkfile_path, &checkfile_path) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let mut double_check_output = String::new(); + double_check_output.push_str(&expected_check_output); + double_check_output.push_str(&expected_check_output); + assert_eq!(double_check_output, stdout); + assert_eq!("", stderr); + + // Corrupt one of the files and check again. + fs::write(dir.path().join("b"), b"CORRUPTION").unwrap(); + let output = cmd!(b3sum_exe(), "--check", &checkfile_path) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_check_failure = "\ + a: OK\n\ + b: FAILED\n\ + c/d: OK\n"; + assert!(!output.status.success()); + assert_eq!(expected_check_failure, stdout); + assert_eq!( + "b3sum: WARNING: 1 computed checksum did NOT match\n", + stderr, + ); + + // Delete one of the files and check again. + fs::remove_file(dir.path().join("b")).unwrap(); + let open_file_error = fs::File::open(dir.path().join("b")).unwrap_err(); + let output = cmd!(b3sum_exe(), "--check", &checkfile_path) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_check_failure = format!( + "a: OK\n\ + b: FAILED ({})\n\ + c/d: OK\n", + open_file_error, + ); + assert!(!output.status.success()); + assert_eq!(expected_check_failure, stdout); + assert_eq!( + "b3sum: WARNING: 1 computed checksum did NOT match\n", + stderr, + ); + + // Confirm that --quiet suppresses the OKs but not the FAILEDs. + let output = cmd!(b3sum_exe(), "--check", "--quiet", &checkfile_path) + .dir(dir.path()) + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_check_failure = format!("b: FAILED ({})\n", open_file_error); + assert!(!output.status.success()); + assert_eq!(expected_check_failure, stdout); + assert_eq!( + "b3sum: WARNING: 1 computed checksum did NOT match\n", + stderr, + ); + } +} + +#[test] +fn test_check_invalid_characters() { + // Check that a null character in the path fails. + let output = cmd!(b3sum_exe(), "--check") + .stdin_bytes("0000000000000000000000000000000000000000000000000000000000000000 \0") + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_stderr = "\ + b3sum: Null character in path\n\ + b3sum: WARNING: 1 computed checksum did NOT match\n"; + assert!(!output.status.success()); + assert_eq!("", stdout); + assert_eq!(expected_stderr, stderr); + + // Check that a Unicode replacement character in the path fails. + let output = cmd!(b3sum_exe(), "--check") + .stdin_bytes("0000000000000000000000000000000000000000000000000000000000000000 �") + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_stderr = "\ + b3sum: Unicode replacement character in path\n\ + b3sum: WARNING: 1 computed checksum did NOT match\n"; + assert!(!output.status.success()); + assert_eq!("", stdout); + assert_eq!(expected_stderr, stderr); + + // Check that an invalid escape sequence in the path fails. + let output = cmd!(b3sum_exe(), "--check") + .stdin_bytes("\\0000000000000000000000000000000000000000000000000000000000000000 \\a") + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_stderr = "\ + b3sum: Invalid backslash escape\n\ + b3sum: WARNING: 1 computed checksum did NOT match\n"; + assert!(!output.status.success()); + assert_eq!("", stdout); + assert_eq!(expected_stderr, stderr); + + // Windows also forbids literal backslashes. Check for that if and only if + // we're on Windows. + if cfg!(windows) { + let output = cmd!(b3sum_exe(), "--check") + .stdin_bytes("0000000000000000000000000000000000000000000000000000000000000000 \\") + .stdout_capture() + .stderr_capture() + .unchecked() + .run() + .unwrap(); + let stdout = std::str::from_utf8(&output.stdout).unwrap(); + let stderr = std::str::from_utf8(&output.stderr).unwrap(); + let expected_stderr = "\ + b3sum: Backslash in path\n\ + b3sum: WARNING: 1 computed checksum did NOT match\n"; + assert!(!output.status.success()); + assert_eq!("", stdout); + assert_eq!(expected_stderr, stderr); + } +} + +#[test] +fn test_globbing() { + // On Unix, globbing is provided by the shell. On Windows, globbing is + // provided by us, using the `wild` crate. + let dir = tempfile::tempdir().unwrap(); + let file1 = dir.path().join("file1"); + fs::write(&file1, b"foo").unwrap(); + let file2 = dir.path().join("file2"); + fs::write(&file2, b"bar").unwrap(); + + let foo_hash = blake3::hash(b"foo"); + let bar_hash = blake3::hash(b"bar"); + // NOTE: This assumes that the glob will be expanded in alphabetical order, + // to "file1 file2" rather than "file2 file1". So far, this seems to + // be true (guaranteed?) of Unix shell behavior, and true in practice + // with the `wild` crate on Windows. It's possible that this could + // start failing in the future, though, or on some unknown platform. + // If that ever happens, we'll need to relax this test somehow, + // probably by just testing for both possible outputs. I'm not + // handling that case in advance, though, because I'd prefer to hear + // about it if it comes up. + let expected = format!("{} file1\n{} file2", foo_hash.to_hex(), bar_hash.to_hex()); + + let star_command = format!("{} *", b3sum_exe().to_str().unwrap()); + let (exe, c_flag) = if cfg!(windows) { + ("cmd.exe", "/C") + } else { + ("/bin/sh", "-c") + }; + let output = cmd!(exe, c_flag, star_command) + .dir(dir.path()) + .read() + .unwrap(); + assert_eq!(expected, output); +} diff --git a/thirdparty/blake3/b3sum/what_does_check_do.md b/thirdparty/blake3/b3sum/what_does_check_do.md new file mode 100644 index 000000000..387c490bc --- /dev/null +++ b/thirdparty/blake3/b3sum/what_does_check_do.md @@ -0,0 +1,176 @@ +# How does `b3sum --check` behave exactly?<br>or: Are filepaths...text? + +Most of the time, `b3sum --check` is a drop-in replacement for `md5sum --check` +and other Coreutils hashing tools. It consumes a checkfile (the output of a +regular `b3sum` command), re-hashes all the files listed there, and returns +success if all of those hashes are still correct. What makes this more +complicated than it might seem, is that representing filepaths as text means we +need to consider many possible edge cases of unrepresentable filepaths. This +document describes all of these edge cases in detail. + +## The simple case + +Here's the result of running `b3sum a b c/d` in a directory that contains +those three files: + +```bash +$ echo hi > a +$ echo lo > b +$ mkdir c +$ echo stuff > c/d +$ b3sum a b c/d +0b8b60248fad7ac6dfac221b7e01a8b91c772421a15b387dd1fb2d6a94aee438 a +6ae4a57bbba24f79c461d30bcb4db973b9427d9207877e34d2d74528daa84115 b +2d477356c962e54784f1c5dc5297718d92087006f6ee96b08aeaf7f3cd252377 c/d +``` + +If we pipe that output into `b3sum --check`, it will exit with status zero +(success) and print: + +```bash +$ b3sum a b c/d | b3sum --check +a: OK +b: OK +c/d: OK +``` + +If we delete `b` and change the contents of `c/d`, and then use the same +checkfile as above, `b3sum --check` will exit with a non-zero status (failure) +and print: + +```bash +$ b3sum a b c/d > checkfile +$ rm b +$ echo more stuff >> c/d +$ b3sum --check checkfile +a: OK +b: FAILED (No such file or directory (os error 2)) +c/d: FAILED +``` + +In these typical cases, `b3sum` and `md5sum` have identical output for success +and very similar output for failure. + +## Escaping newlines and backslashes + +Since the checkfile format (the regular output format of `b3sum`) is +newline-separated text, we need to worry about what happens when a filepath +contains a newline, or worse. Suppose we create a file named `x[newline]x` +(3 characters). One way to create such a file is with a Python one-liner like +this: + +```python +>>> open("x\nx", "w") +``` + +Here's what happens when we hash that file with `b3sum`: + +```bash +$ b3sum x* +\af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262 x\nx +``` + +Notice two things. First, `b3sum` puts a single `\` character at the front of +the line. This indicates that the filepath contains escape sequences that +`b3sum --check` will need to unescape. Then, `b3sum` replaces the newline +character in the filepath with the two-character escape sequence `\n`. +Similarly, if the filepath contained carriage returns or backslashes, `b3sum` +would escape those as `\r` and `\\` in the output. So far, all of this behavior +is still identical to `md5sum`. (Note: Coreutils [introduced `\r` +escaping](https://github.com/coreutils/coreutils/commit/ed1c58427d574fb4ff0cb8f915eb0d554000ceeb) +in v9.0, September 2021.) + +## Invalid Unicode + +This is where `b3sum` and `md5sum` diverge. Apart from the newline and +backslash escapes described above, `md5sum` copies all other filepath bytes +verbatim to its output. That means its output encoding is "ASCII plus whatever +bytes we got from the command line". This creates two problems: + +1. Printing something that isn't UTF-8 is kind of gross. +2. Windows support. + +What's the deal with Windows? To start with, there's a fundamental difference +in how Unix and Windows represent filepaths. Unix filepaths are "usually UTF-8" +and Windows filepaths are "usually UTF-16". That means that a file named `abc` +is typically represented as the bytes `[97, 98, 99]` on Unix and as the bytes +`[97, 0, 98, 0, 99, 0]` on Windows. The `md5sum` approach won't work if we plan +on creating a checkfile on Unix and checking it on Windows, or vice versa. + +A more portable approach is to convert platform-specific bytes into some +consistent Unicode encoding. (In practice this is going to be UTF-8, but in +theory it could be anything.) Then when `--check` needs to open a file, we +convert the Unicode representation back into platform-specific bytes. This +makes important common cases like `abc`, and in fact even `abc[newline]def`, +work as expected. Great! + +But...what did we mean above when we said *usually* UTF-8 and *usually* UTF-16? +It turns out that not every possible sequence of bytes is valid UTF-8, and not +every possible sequence of 16-bit wide chars is valid UTF-16. For example, the +byte 0xFF (255) can never appear in any UTF-8 string. If we ask Python to +decode it, it yells at us: + +```python +>>> b"\xFF".decode("UTF-8") +UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte +``` + +However, tragically, we *can* create a file with that byte in its name (on +Linux at least, though not usually on macOS): + +```python +>>> open(b"y\xFFy", "w") +``` + +So some filepaths aren't representable in Unicode at all. Our plan to "convert +platform-specific bytes into some consistent Unicode encoding" isn't going to +work for everything. What does `b3sum` do with the file above? + +```bash +$ b3sum y* +af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262 y�y +``` + +That � in there is a "Unicode replacement character". When we run into +filepaths that we can't represent in Unicode, we replace the unrepresentable +parts with these characters. On the checking side, to avoid any possible +confusion between two different invalid filepaths, we automatically fail if we +see a replacement character. Together with a few more details covered in the +next section, this gives us an important set of properties: + +1. Any file can be hashed locally. +2. Any file with a valid Unicode name not containing the � character can be + checked. +3. Checking ambiguous or unrepresentable filepaths always fails. +4. Checkfiles are always valid UTF-8. +5. Checkfiles are portable between Unix and Windows. + +## Formal Rules + +1. When hashing, filepaths are represented in a platform-specific encoding, + which can accommodate any filepath on the current platform. In Rust, this is + `OsStr`/`OsString`. +2. In output, filepaths are first converted to UTF-8. Any non-Unicode segments + are replaced with Unicode replacement characters (U+FFFD). In Rust, this is + `OsStr::to_string_lossy`. +3. Then, if a filepath contains any backslashes (U+005C) or newlines (U+000A), + these characters are escaped as `\\` and `\n` respectively. +4. Finally, any output line containing an escape sequence is prefixed with a + single backslash. +5. When checking, each line is parsed as UTF-8, separated by a newline + (U+000A). Invalid UTF-8 is an error. +6. Then, if a line begins with a backslash, the filepath component is + unescaped. Any escape sequence other than `\\` or `\n` is an error. If a + line does not begin with a backslash, unescaping is not performed, and any + backslashes in the filepath component are interpreted literally. (`b3sum` + output never contains unescaped backslashes, but they can occur in + checkfiles assembled by hand.) +7. Finally, if a filepath contains a Unicode replacement character (U+FFFD) or + a null character (U+0000), it is an error. + + **Additionally, on Windows only:** + +8. In output, all backslashes (U+005C) are replaced with forward slashes + (U+002F). +9. When checking, after unescaping, if a filepath contains a backslash, it is + an error. |