aboutsummaryrefslogtreecommitdiff
path: root/thirdparty/ryml/README.md
blob: ddfa354543c792c4fdd7bd4ff40d989ba75f7751 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
# Rapid YAML
[![MIT Licensed](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/biojppm/rapidyaml/blob/master/LICENSE.txt)
[![release](https://img.shields.io/github/v/release/biojppm/rapidyaml?color=g&include_prereleases&label=release%20&sort=semver)](https://github.com/biojppm/rapidyaml/releases)
[![PyPI](https://img.shields.io/pypi/v/rapidyaml?color=g)](https://pypi.org/project/rapidyaml/)
[![Docs](https://img.shields.io/badge/docs-docsforge-blue)](https://rapidyaml.docsforge.com/)
[![Gitter](https://badges.gitter.im/rapidyaml/community.svg)](https://gitter.im/rapidyaml/community)

[![test](https://github.com/biojppm/rapidyaml/workflows/test/badge.svg?branch=master)](https://github.com/biojppm/rapidyaml/actions)
<!-- [![Coveralls](https://coveralls.io/repos/github/biojppm/rapidyaml/badge.svg?branch=master)](https://coveralls.io/github/biojppm/rapidyaml) -->
[![Codecov](https://codecov.io/gh/biojppm/rapidyaml/branch/master/graph/badge.svg?branch=master)](https://codecov.io/gh/biojppm/rapidyaml)


Or ryml, for short. ryml is a C++ library to parse and emit YAML,
and do it fast, on everything from x64 to bare-metal chips without
operating system. (If you are looking to use your programs with a YAML tree
as a configuration tree with override facilities, take a look at
[c4conf](https://github.com/biojppm/c4conf)).

ryml parses both read-only and in-situ source buffers; the resulting
data nodes hold only views to sub-ranges of the source buffer. No
string copies or duplications are done, and no virtual functions are
used. The data tree is a flat index-based structure stored in a single
array. Serialization happens only at your direct request, after
parsing / before emitting. Internally, the data tree representation
stores only string views and has no knowledge of types, but of course,
every node can have a YAML type tag. ryml makes it easy and fast to
read and modify the data tree.

ryml is available as a single header file, or it can be used as a
simple library with cmake -- both separately (ie
build->install->`find_package()`) or together with your project (ie with
`add_subdirectory()`). (See below for examples).

ryml can use custom global and per-tree memory allocators and error
handler callbacks, and is exception-agnostic. ryml provides a default
implementation for the allocator (using `std::malloc()`) and error
handlers (using using `std::abort()` is provided, but you can opt out
and provide your own memory allocation and eg, exception-throwing
callbacks.

ryml does not depend on the STL, ie, it does not use any std container
as part of its data structures), but it can serialize and deserialize
these containers into the data tree, with the use of optional
headers. ryml ships with [c4core](https://github.com/biojppm/c4core), a
small C++ utilities multiplatform library.

ryml is written in C++11, and compiles cleanly with:
* Visual Studio 2015 and later
* clang++ 3.9 and later
* g++ 4.8 and later
* Intel Compiler

ryml is [extensively unit-tested in Linux, Windows and
MacOS](https://github.com/biojppm/rapidyaml/actions). The tests cover
x64, x86, wasm (emscripten), arm, aarch64, ppc64le and s390x
architectures, and include analysing ryml with:
  * valgrind
  * clang-tidy
  * clang sanitizers:
    * memory
    * address
    * undefined behavior
    * thread
  * [LGTM.com](https://lgtm.com/projects/g/biojppm/rapidyaml)

ryml also [runs in
bare-metal](https://github.com/biojppm/rapidyaml/issues/193), and
[RISC-V
architectures](https://github.com/biojppm/c4core/pull/69). Both of
these are pending implementation of CI actions for continuous
validation, but ryml has been proven to work there.

ryml is [available in Python](https://pypi.org/project/rapidyaml/),
and can very easily be compiled to JavaScript through emscripten (see
below).

See also [the changelog](https://github.com/biojppm/rapidyaml/tree/master/changelog)
and [the roadmap](https://github.com/biojppm/rapidyaml/tree/master/ROADMAP.md).

<!-- endpythonreadme -->


------

## Table of contents
* [Is it rapid?](#is-it-rapid)
  * [Comparison with yaml-cpp](#comparison-with-yaml-cpp)
  * [Performance reading JSON](#performance-reading-json)
  * [Performance emitting](#performance-emitting)
* [Quick start](#quick-start)
* [Using ryml in your project](#using-ryml-in-your-project)
  * [Package managers](#package-managers)
  * [Single header file](#single-header-file)
  * [As a library](#as-a-library)
  * [Quickstart samples](#quickstart-samples)
  * [CMake build settings for ryml](#cmake-build-settings-for-ryml)
     * [Forcing ryml to use a different c4core version](#forcing-ryml-to-use-a-different-c4core-version)
* [Other languages](#other-languages)
  * [JavaScript](#javascript)
  * [Python](#python)
* [YAML standard conformance](#yaml-standard-conformance)
  * [Test suite status](#test-suite-status)
* [Known limitations](#known-limitations)
* [Alternative libraries](#alternative-libraries)
* [License](#license)


------

## Is it rapid?

You bet! On a i7-6800K CPU @3.40GHz:
 * ryml parses YAML at about ~150MB/s on Linux and ~100MB/s on Windows (vs2017). 
 * **ryml parses JSON at about ~450MB/s on Linux**, faster than sajson (didn't
   try yet on Windows).
 * compared against the other existing YAML libraries for C/C++:
   * ryml is in general between 2 and 3 times faster than [libyaml](https://github.com/yaml/libyaml)
   * ryml is in general between 10 and 70 times faster than
     [yaml-cpp](https://github.com/jbeder/yaml-cpp), and in some cases as
     much as 100x and [even
     200x](https://github.com/biojppm/c4core/pull/16#issuecomment-700972614) faster.

[Here's the benchmark](./bm/bm_parse.cpp). Using different
approaches within ryml (in-situ/read-only vs. with/without reuse), a YAML /
JSON buffer is repeatedly parsed, and compared against other libraries.

### Comparison with yaml-cpp

The first result set is for Windows, and is using a [appveyor.yml config
file](./bm/cases/appveyor.yml). A comparison of these results is
summarized on the table below:

| Read rates (MB/s)            | ryml   | yamlcpp | compared     |
|------------------------------|--------|---------|--------------|
| appveyor / vs2017 / Release  | 101.5  | 5.3     |  20x / 5.2%  |
| appveyor / vs2017 / Debug    |   6.4  | 0.0844  |  76x / 1.3%  |


The next set of results is taken in Linux, comparing g++ 8.2 and clang++ 7.0.1 in
parsing a YAML buffer from a [travis.yml config
file](./bm/cases/travis.yml) or a JSON buffer from a [compile_commands.json
file](./bm/cases/compile_commands.json). You
can [see the full results here](./bm/results/parse.linux.i7_6800K.md).
Summarizing:

| Read rates (MB/s)           | ryml   | yamlcpp | compared   |
|-----------------------------|--------|---------|------------|
| json   / clang++ / Release  | 453.5  | 15.1    |  30x / 3%  |
| json   /     g++ / Release  | 430.5  | 16.3    |  26x / 4%  |
| json   / clang++ / Debug    |  61.9  | 1.63    |  38x / 3%  |
| json   /     g++ / Debug    |  72.6  | 1.53    |  47x / 2%  |
| travis / clang++ / Release  | 131.6  | 8.08    |  16x / 6%  |
| travis /     g++ / Release  | 176.4  | 8.23    |  21x / 5%  |
| travis / clang++ / Debug    |  10.2  | 1.08    |   9x / 1%  |
| travis /     g++ / Debug    |  12.5  | 1.01    |  12x / 8%  |

The 450MB/s read rate for JSON puts ryml squarely in the same ballpark
as [RapidJSON](https://github.com/Tencent/rapidjson) and other fast json
readers
([data from here](https://lemire.me/blog/2018/05/03/how-fast-can-you-parse-json/)).
Even parsing full YAML is at ~150MB/s, which is still in that performance
ballpark, albeit at its lower end. This is something to be proud of, as the
YAML specification is much more complex than JSON: [23449 vs 1969 words](https://www.arp242.net/yaml-config.html#its-pretty-complex).


### Performance reading JSON

So how does ryml compare against other JSON readers? Well, it's one of the
fastest! 

The benchmark is the [same as above](./bm/parse.cpp), and it is reading
the [compile_commands.json](./bm/cases/compile_commands.json), The `_arena`
suffix notes parsing a read-only buffer (so buffer copies are performed),
while the `_inplace` suffix means that the source buffer can be parsed in
place. The `_reuse` means the data tree and/or parser are reused on each
benchmark repeat.

Here's what we get with g++ 8.2:

| Benchmark             | Release,MB/s | Debug,MB/s  |
|:----------------------|-------------:|------------:|
| rapidjson_arena       |       509.9  |       43.4  |
| rapidjson_inplace     |      1329.4  |       68.2  |
| sajson_inplace        |       434.2  |      176.5  |
| sajson_arena          |       430.7  |      175.6  |
| jsoncpp_arena         |       183.6  |    ? 187.9  |
| nlohmann_json_arena   |       115.8  |       21.5  |
| yamlcpp_arena         |        16.6  |        1.6  |
| libyaml_arena         |       113.9  |       35.7  |
| libyaml_arena_reuse   |       114.6  |       35.9  |
| ryml_arena            |       388.6  |       36.9  |
| ryml_inplace          |       393.7  |       36.9  |
| ryml_arena_reuse      |       446.2  |       74.6  |
| ryml_inplace_reuse    |       457.1  |       74.9  |

You can verify that (at least for this test) ryml beats most json
parsers at their own game, with the only exception of
[rapidjson](https://github.com/Tencent/rapidjson). And actually, in
Debug, [rapidjson](https://github.com/Tencent/rapidjson) is slower
than ryml, and [sajson](https://github.com/chadaustin/sajson)
manages to be faster (but not sure about jsoncpp; need to scrutinize there
the suspicious fact that the Debug result is faster than the Release result).


### Performance emitting

[Emitting benchmarks](bm/bm_emit.cpp) also show similar speedups from
the existing libraries, also anecdotally reported by some users [(eg,
here's a user reporting 25x speedup from
yaml-cpp)](https://github.com/biojppm/rapidyaml/issues/28#issue-553855608). Also, in
some cases (eg, block folded multiline scalars), the speedup is as
high as 200x (eg, 7.3MB/s -> 1.416MG/s).


### CI results and request for files

While a more effective way of showing the benchmark results is not
available yet, you can browse through the [runs of the benchmark
workflow in the
CI](https://github.com/biojppm/rapidyaml/actions/workflows/benchmarks.yml)
to scroll through the results for yourself.

Also, if you have a case where ryml behaves very nicely or not as nicely as
claimed above, we would definitely like to see it! Please submit a pull request
adding the file to [bm/cases](bm/cases), or just send us the files.


------

## Quick start

If you're wondering whether ryml's speed comes at a usage cost, you
need not: with ryml, you can have your cake and eat it too. Being
rapid is definitely NOT the same as being unpractical, so ryml was
written with easy AND efficient usage in mind, and comes with a two
level API for accessing and traversing the data tree.

The following snippet is a quick overview taken from [the quickstart
sample](samples/quickstart.cpp). After cloning ryml (don't forget the
`--recursive` flag for git), you can very
easily build and run this executable using any of the build samples,
eg the [`add_subdirectory()` sample](samples/add_subdirectory/).

```c++
// Parse YAML code in place, potentially mutating the buffer.
// It is also possible to:
//   - parse a read-only buffer using parse_in_arena()
//   - reuse an existing tree (advised)
//   - reuse an existing parser (advised)
char yml_buf[] = "{foo: 1, bar: [2, 3], john: doe}";
ryml::Tree tree = ryml::parse_in_place(yml_buf);

// Note: it will always be significantly faster to use mutable
// buffers and reuse tree+parser.
//
// Below you will find samples that show how to achieve reuse; but
// please note that for brevity and clarity, many of the examples
// here are parsing immutable buffers, and not reusing tree or
// parser.


//------------------------------------------------------------------
// API overview

// ryml has a two-level API:
//
// The lower level index API is based on the indices of nodes,
// where the node's id is the node's position in the tree's data
// array. This API is very efficient, but somewhat difficult to use:
size_t root_id = tree.root_id();
size_t bar_id = tree.find_child(root_id, "bar"); // need to get the index right
CHECK(tree.is_map(root_id)); // all of the index methods are in the tree
CHECK(tree.is_seq(bar_id));  // ... and receive the subject index

// The node API is a lightweight abstraction sitting on top of the
// index API, but offering a much more convenient interaction:
ryml::ConstNodeRef root = tree.rootref();
ryml::ConstNodeRef bar = tree["bar"];
CHECK(root.is_map());
CHECK(bar.is_seq());
// A node ref is a lightweight handle to the tree and associated id:
CHECK(root.tree() == &tree); // a node ref points at its tree, WITHOUT refcount
CHECK(root.id() == root_id); // a node ref's id is the index of the node
CHECK(bar.id() == bar_id);   // a node ref's id is the index of the node

// The node API translates very cleanly to the index API, so most
// of the code examples below are using the node API.

// One significant point of the node API is that it holds a raw
// pointer to the tree. Care must be taken to ensure the lifetimes
// match, so that a node will never access the tree after the tree
// went out of scope.


//------------------------------------------------------------------
// To read the parsed tree

// ConstNodeRef::operator[] does a lookup, is O(num_children[node]).
CHECK(tree["foo"].is_keyval());
CHECK(tree["foo"].key() == "foo");
CHECK(tree["foo"].val() == "1");
CHECK(tree["bar"].is_seq());
CHECK(tree["bar"].has_key());
CHECK(tree["bar"].key() == "bar");
// maps use string keys, seqs use integral keys:
CHECK(tree["bar"][0].val() == "2");
CHECK(tree["bar"][1].val() == "3");
CHECK(tree["john"].val() == "doe");
// An integral key is the position of the child within its parent,
// so even maps can also use int keys, if the key position is
// known.
CHECK(tree[0].id() == tree["foo"].id());
CHECK(tree[1].id() == tree["bar"].id());
CHECK(tree[2].id() == tree["john"].id());
// Tree::operator[](int) searches a ***root*** child by its position.
CHECK(tree[0].id() == tree["foo"].id());  // 0: first child of root
CHECK(tree[1].id() == tree["bar"].id());  // 1: first child of root
CHECK(tree[2].id() == tree["john"].id()); // 2: first child of root
// NodeRef::operator[](int) searches a ***node*** child by its position:
CHECK(bar[0].val() == "2"); // 0 means first child of bar
CHECK(bar[1].val() == "3"); // 1 means second child of bar
// NodeRef::operator[](string):
// A string key is the key of the node: lookup is by name. So it
// is only available for maps, and it is NOT available for seqs,
// since seq members do not have keys.
CHECK(tree["foo"].key() == "foo");
CHECK(tree["bar"].key() == "bar");
CHECK(tree["john"].key() == "john");
CHECK(bar.is_seq());
// CHECK(bar["BOOM!"].is_seed()); // error, seqs do not have key lookup

// Note that maps can also use index keys as well as string keys:
CHECK(root["foo"].id() == root[0].id());
CHECK(root["bar"].id() == root[1].id());
CHECK(root["john"].id() == root[2].id());

// IMPORTANT. The ryml tree uses indexed linked lists for storing
// children, so the complexity of `Tree::operator[csubstr]` and
// `Tree::operator[size_t]` is linear on the number of root
// children. If you use `Tree::operator[]` with a large tree where
// the root has many children, you will see a performance hit.
//
// To avoid this hit, you can create your own accelerator
// structure. For example, before doing a lookup, do a single
// traverse at the root level to fill an `map<csubstr,size_t>`
// mapping key names to node indices; with a node index, a lookup
// (via `Tree::get()`) is O(1), so this way you can get O(log n)
// lookup from a key. (But please do not use `std::map` if you
// care about performance; use something else like a flat map or
// sorted vector).
//
// As for node refs, the difference from `NodeRef::operator[]` and
// `ConstNodeRef::operator[]` to `Tree::operator[]` is that the
// latter refers to the root node, whereas the former are invoked
// on their target node. But the lookup process works the same for
// both and their algorithmic complexity is the same: they are
// both linear in the number of direct children. But of course,
// depending on the data, that number may be very different from
// one to another.

//------------------------------------------------------------------
// Hierarchy:

{
    ryml::ConstNodeRef foo = root.first_child();
    ryml::ConstNodeRef john = root.last_child();
    CHECK(tree.size() == 6); // O(1) number of nodes in the tree
    CHECK(root.num_children() == 3); // O(num_children[root])
    CHECK(foo.num_siblings() == 3); // O(num_children[parent(foo)])
    CHECK(foo.parent().id() == root.id()); // parent() is O(1)
    CHECK(root.first_child().id() == root["foo"].id()); // first_child() is O(1)
    CHECK(root.last_child().id() == root["john"].id()); // last_child() is O(1)
    CHECK(john.first_sibling().id() == foo.id());
    CHECK(foo.last_sibling().id() == john.id());
    // prev_sibling(), next_sibling(): (both are O(1))
    CHECK(foo.num_siblings() == root.num_children());
    CHECK(foo.prev_sibling().id() == ryml::NONE); // foo is the first_child()
    CHECK(foo.next_sibling().key() == "bar");
    CHECK(foo.next_sibling().next_sibling().key() == "john");
    CHECK(foo.next_sibling().next_sibling().next_sibling().id() == ryml::NONE); // john is the last_child()
}


//------------------------------------------------------------------
// Iterating:
{
    ryml::csubstr expected_keys[] = {"foo", "bar", "john"};
    // iterate children using the high-level node API:
    {
        size_t count = 0;
        for(ryml::ConstNodeRef const& child : root.children())
            CHECK(child.key() == expected_keys[count++]);
    }
    // iterate siblings using the high-level node API:
    {
        size_t count = 0;
        for(ryml::ConstNodeRef const& child : root["foo"].siblings())
            CHECK(child.key() == expected_keys[count++]);
    }
    // iterate children using the lower-level tree index API:
    {
        size_t count = 0;
        for(size_t child_id = tree.first_child(root_id); child_id != ryml::NONE; child_id = tree.next_sibling(child_id))
            CHECK(tree.key(child_id) == expected_keys[count++]);
    }
    // iterate siblings using the lower-level tree index API:
    // (notice the only difference from above is in the loop
    // preamble, which calls tree.first_sibling(bar_id) instead of
    // tree.first_child(root_id))
    {
        size_t count = 0;
        for(size_t child_id = tree.first_sibling(bar_id); child_id != ryml::NONE; child_id = tree.next_sibling(child_id))
            CHECK(tree.key(child_id) == expected_keys[count++]);
    }
}


//------------------------------------------------------------------
// Gotchas:
CHECK(!tree["bar"].has_val());          // seq is a container, so no val
CHECK(!tree["bar"][0].has_key());       // belongs to a seq, so no key
CHECK(!tree["bar"][1].has_key());       // belongs to a seq, so no key
//CHECK(tree["bar"].val() == BOOM!);    // ... so attempting to get a val is undefined behavior
//CHECK(tree["bar"][0].key() == BOOM!); // ... so attempting to get a key is undefined behavior
//CHECK(tree["bar"][1].key() == BOOM!); // ... so attempting to get a key is undefined behavior


//------------------------------------------------------------------
// Deserializing: use operator>>
{
    int foo = 0, bar0 = 0, bar1 = 0;
    std::string john;
    root["foo"] >> foo;
    root["bar"][0] >> bar0;
    root["bar"][1] >> bar1;
    root["john"] >> john; // requires from_chars(std::string). see serialization samples below.
    CHECK(foo == 1);
    CHECK(bar0 == 2);
    CHECK(bar1 == 3);
    CHECK(john == "doe");
}


//------------------------------------------------------------------
// Modifying existing nodes: operator<< vs operator=

// As implied by its name, ConstNodeRef is a reference to a const
// node. It can be used to read from the node, but not write to it
// or modify the hierarchy of the node. If any modification is
// desired then a NodeRef must be used instead:
ryml::NodeRef wroot = tree.rootref();

// operator= assigns an existing string to the receiving node.
// This pointer will be in effect until the tree goes out of scope
// so beware to only assign from strings outliving the tree.
wroot["foo"] = "says you";
wroot["bar"][0] = "-2";
wroot["bar"][1] = "-3";
wroot["john"] = "ron";
// Now the tree is _pointing_ at the memory of the strings above.
// That is OK because those are static strings and will outlive
// the tree.
CHECK(root["foo"].val() == "says you");
CHECK(root["bar"][0].val() == "-2");
CHECK(root["bar"][1].val() == "-3");
CHECK(root["john"].val() == "ron");
// WATCHOUT: do not assign from temporary objects:
// {
//     std::string crash("will dangle");
//     root["john"] = ryml::to_csubstr(crash);
// }
// CHECK(root["john"] == "dangling"); // CRASH! the string was deallocated

// operator<< first serializes the input to the tree's arena, then
// assigns the serialized string to the receiving node. This avoids
// constraints with the lifetime, since the arena lives with the tree.
CHECK(tree.arena().empty());
wroot["foo"] << "says who";  // requires to_chars(). see serialization samples below.
wroot["bar"][0] << 20;
wroot["bar"][1] << 30;
wroot["john"] << "deere";
CHECK(root["foo"].val() == "says who");
CHECK(root["bar"][0].val() == "20");
CHECK(root["bar"][1].val() == "30");
CHECK(root["john"].val() == "deere");
CHECK(tree.arena() == "says who2030deere"); // the result of serializations to the tree arena
// using operator<< instead of operator=, the crash above is avoided:
{
    std::string ok("in_scope");
    // root["john"] = ryml::to_csubstr(ok); // don't, will dangle
    wroot["john"] << ryml::to_csubstr(ok); // OK, copy to the tree's arena
}
CHECK(root["john"] == "in_scope"); // OK!
CHECK(tree.arena() == "says who2030deerein_scope"); // the result of serializations to the tree arena


//------------------------------------------------------------------
// Adding new nodes:

// adding a keyval node to a map:
CHECK(root.num_children() == 3);
wroot["newkeyval"] = "shiny and new"; // using these strings
wroot.append_child() << ryml::key("newkeyval (serialized)") << "shiny and new (serialized)"; // serializes and assigns the serialization
CHECK(root.num_children() == 5);
CHECK(root["newkeyval"].key() == "newkeyval");
CHECK(root["newkeyval"].val() == "shiny and new");
CHECK(root["newkeyval (serialized)"].key() == "newkeyval (serialized)");
CHECK(root["newkeyval (serialized)"].val() == "shiny and new (serialized)");
CHECK( ! tree.in_arena(root["newkeyval"].key())); // it's using directly the static string above
CHECK( ! tree.in_arena(root["newkeyval"].val())); // it's using directly the static string above
CHECK(   tree.in_arena(root["newkeyval (serialized)"].key())); // it's using a serialization of the string above
CHECK(   tree.in_arena(root["newkeyval (serialized)"].val())); // it's using a serialization of the string above
// adding a val node to a seq:
CHECK(root["bar"].num_children() == 2);
wroot["bar"][2] = "oh so nice";
wroot["bar"][3] << "oh so nice (serialized)";
CHECK(root["bar"].num_children() == 4);
CHECK(root["bar"][2].val() == "oh so nice");
CHECK(root["bar"][3].val() == "oh so nice (serialized)");
// adding a seq node:
CHECK(root.num_children() == 5);
wroot["newseq"] |= ryml::SEQ;
wroot.append_child() << ryml::key("newseq (serialized)") |= ryml::SEQ;
CHECK(root.num_children() == 7);
CHECK(root["newseq"].num_children() == 0);
CHECK(root["newseq (serialized)"].num_children() == 0);
// adding a map node:
CHECK(root.num_children() == 7);
wroot["newmap"] |= ryml::MAP;
wroot.append_child() << ryml::key("newmap (serialized)") |= ryml::SEQ;
CHECK(root.num_children() == 9);
CHECK(root["newmap"].num_children() == 0);
CHECK(root["newmap (serialized)"].num_children() == 0);
//
// When the tree is mutable, operator[] does not mutate the tree
// until the returned node is written to.
//
// Until such time, the NodeRef object keeps in itself the required
// information to write to the proper place in the tree. This is
// called being in a "seed" state.
//
// This means that passing a key/index which does not exist will
// not mutate the tree, but will instead store (in the node) the
// proper place of the tree to be able to do so, if and when it is
// required.
//
// This is a significant difference from eg, the behavior of
// std::map, which mutates the map immediately within the call to
// operator[].
//
// All of the points above apply only if the tree is mutable. If
// the tree is const, then a NodeRef cannot be obtained from it;
// only a ConstNodeRef, which can never be used to mutate the
// tree.
CHECK(!root.has_child("I am not nothing"));
ryml::NodeRef nothing = wroot["I am nothing"];
CHECK(nothing.valid());   // points at the tree, and a specific place in the tree
CHECK(nothing.is_seed()); // ... but nothing is there yet.
CHECK(!root.has_child("I am nothing")); // same as above
ryml::NodeRef something = wroot["I am something"];
ryml::ConstNodeRef constsomething = wroot["I am something"];
CHECK(!root.has_child("I am something")); // same as above
CHECK(something.valid());
CHECK(something.is_seed()); // same as above
CHECK(!constsomething.valid()); // NOTE: because a ConstNodeRef
                                // cannot be used to mutate a
                                // tree, it is only valid() if it
                                // is pointing at an existing
                                // node.
something = "indeed";  // this will commit to the tree, mutating at the proper place
CHECK(root.has_child("I am something"));
CHECK(root["I am something"].val() == "indeed");
CHECK(something.valid());
CHECK(!something.is_seed()); // now the tree has this node, so the
                             // ref is no longer a seed
// now the constref is also valid (but it needs to be reassigned):
ryml::ConstNodeRef constsomethingnew = wroot["I am something"];
CHECK(constsomethingnew.valid());
// note that the old constref is now stale, because it only keeps
// the state at creation:
CHECK(!constsomething.valid());


//------------------------------------------------------------------
// Emitting:

// emit to a FILE*
ryml::emit_yaml(tree, stdout); // there is also emit_json()
// emit to a stream
std::stringstream ss;
ss << tree;
std::string stream_result = ss.str();
// emit to a buffer:
std::string str_result = ryml::emitrs_yaml<std::string>(tree); // there is also emitrs_json()
// can emit to any given buffer:
char buf[1024];
ryml::csubstr buf_result = ryml::emit_yaml(tree, buf);
// now check
ryml::csubstr expected_result = R"(foo: says who
bar:
- 20
- 30
- oh so nice
- oh so nice (serialized)
john: in_scope
newkeyval: shiny and new
newkeyval (serialized): shiny and new (serialized)
newseq: []
newseq (serialized): []
newmap: {}
newmap (serialized): []
I am something: indeed
)";
CHECK(buf_result == expected_result);
CHECK(str_result == expected_result);
CHECK(stream_result == expected_result);
// There are many possibilities to emit to buffer;
// please look at the emit sample functions below.

//------------------------------------------------------------------
// ConstNodeRef vs NodeRef

ryml::NodeRef noderef = tree["bar"][0];
ryml::ConstNodeRef constnoderef = tree["bar"][0];

// ConstNodeRef cannot be used to mutate the tree, but a NodeRef can:
//constnoderef = "21";  // compile error
//constnoderef << "22"; // compile error
noderef = "21";         // ok, can assign because it's not const
CHECK(tree["bar"][0].val() == "21");
noderef << "22";        // ok, can serialize and assign because it's not const
CHECK(tree["bar"][0].val() == "22");

// it is not possible to obtain a NodeRef from a ConstNodeRef:
// noderef = constnoderef; // compile error

// it is always possible to obtain a ConstNodeRef from a NodeRef:
constnoderef = noderef;    // ok can assign const <- nonconst

// If a tree is const, then only ConstNodeRef's can be
// obtained from that tree:
ryml::Tree const& consttree = tree;
//noderef = consttree["bar"][0];    // compile error
noderef = tree["bar"][0];           // ok
constnoderef = consttree["bar"][0]; // ok

// ConstNodeRef and NodeRef can be compared for equality.
// Equality means they point at the same node.
CHECK(constnoderef == noderef);
CHECK(!(constnoderef != noderef));

//------------------------------------------------------------------
// Dealing with UTF8
ryml::Tree langs = ryml::parse_in_arena(R"(
en: Planet (Gas)
fr: Planète (Gazeuse)
ru: Планета (Газ)
ja: 惑星(ガス)
zh: 行星(气体)
# UTF8 decoding only happens in double-quoted strings,\
# as per the YAML standard
decode this: "\u263A \xE2\x98\xBA"
and this as well: "\u2705 \U0001D11E"
)");
// in-place UTF8 just works:
CHECK(langs["en"].val() == "Planet (Gas)");
CHECK(langs["fr"].val() == "Planète (Gazeuse)");
CHECK(langs["ru"].val() == "Планета (Газ)");
CHECK(langs["ja"].val() == "惑星(ガス)");
CHECK(langs["zh"].val() == "行星(气体)");
// and \x \u \U codepoints are decoded (but only when they appear
// inside double-quoted strings, as dictated by the YAML
// standard):
CHECK(langs["decode this"].val() == "☺ ☺");
CHECK(langs["and this as well"].val() == "✅ 𝄞");

//------------------------------------------------------------------
// Getting the location of nodes in the source:
ryml::Parser parser;
ryml::Tree tree2 = parser.parse_in_arena("expected.yml", expected_result);
ryml::Location loc = parser.location(tree2["bar"][1]);
CHECK(parser.location_contents(loc).begins_with("30"));
CHECK(loc.line == 3u);
CHECK(loc.col == 4u);
```

The [quickstart.cpp sample](./samples/quickstart.cpp) (from which the
above overview was taken) has many more detailed examples, and should
be your first port of call to find out any particular point about
ryml's API. It is tested in the CI, and thus has the correct behavior.
There you can find the following subjects being addressed:

```c++
sample_substr();               ///< about ryml's string views (from c4core)
sample_parse_file();           ///< ready-to-go example of parsing a file from disk
sample_parse_in_place();       ///< parse a mutable YAML source buffer
sample_parse_in_arena();       ///< parse a read-only YAML source buffer
sample_parse_reuse_tree();     ///< parse into an existing tree, maybe into a node
sample_parse_reuse_parser();   ///< reuse an existing parser
sample_parse_reuse_tree_and_parser(); ///< how to reuse existing trees and parsers
sample_iterate_trees();        ///< visit individual nodes and iterate through trees
sample_create_trees();         ///< programatically create trees
sample_tree_arena();           ///< interact with the tree's serialization arena
sample_fundamental_types();    ///< serialize/deserialize fundamental types
sample_formatting();           ///< control formatting when serializing/deserializing
sample_base64();               ///< encode/decode base64
sample_user_scalar_types();    ///< serialize/deserialize scalar (leaf/string) types
sample_user_container_types(); ///< serialize/deserialize container (map or seq) types
sample_std_types();            ///< serialize/deserialize STL containers
sample_emit_to_container();    ///< emit to memory, eg a string or vector-like container
sample_emit_to_stream();       ///< emit to a stream, eg std::ostream
sample_emit_to_file();         ///< emit to a FILE*
sample_emit_nested_node();     ///< pick a nested node as the root when emitting
sample_json();                 ///< JSON parsing and emitting
sample_anchors_and_aliases();  ///< deal with YAML anchors and aliases
sample_tags();                 ///< deal with YAML type tags
sample_docs();                 ///< deal with YAML docs
sample_error_handler();        ///< set a custom error handler
sample_global_allocator();     ///< set a global allocator for ryml
sample_per_tree_allocator();   ///< set per-tree allocators
sample_static_trees();         ///< how to use static trees in ryml
sample_location_tracking();    ///< track node locations in the parsed source tree
```


------

## Using ryml in your project

### Package managers

If you opt for package managers, here's where ryml is available so far
(thanks to all the contributors!):
  * [vcpkg](https://vcpkg.io/en/packages.html): `vcpkg install ryml`
  * Arch Linux/Manjaro:
    * [rapidyaml-git (AUR)](https://aur.archlinux.org/packages/rapidyaml-git/)
    * [python-rapidyaml-git (AUR)](https://aur.archlinux.org/packages/python-rapidyaml-git/)
  * [PyPI](https://pypi.org/project/rapidyaml/)

Although package managers are very useful for quickly getting up to
speed, the advised way is still to bring ryml as a submodule of your
project, building both together. This makes it easy to track any
upstream changes in ryml. Also, ryml is small and quick to build, so
there's not much of a cost for building it with your project.

### Single header file
ryml is provided chiefly as a cmake library project, but it can also
be used as a single header file, and there is a [tool to
amalgamate](./tools/amalgamate.py) the code into a single header
file. The amalgamated header file is provided with each release, but
you can also generate a customized file suiting your particular needs
(or commit):

```console
[user@host rapidyaml]$ python3 tools/amalgamate.py -h
usage: amalgamate.py [-h] [--c4core | --no-c4core] [--fastfloat | --no-fastfloat] [--stl | --no-stl] [output]

positional arguments:
  output          output file. defaults to stdout

optional arguments:
  -h, --help      show this help message and exit
  --c4core        amalgamate c4core together with ryml. this is the default.
  --no-c4core     amalgamate c4core together with ryml. the default is --c4core.
  --fastfloat     enable fastfloat library. this is the default.
  --no-fastfloat  enable fastfloat library. the default is --fastfloat.
  --stl           enable stl interop. this is the default.
  --no-stl        enable stl interop. the default is --stl.
```

The amalgamated header file contains all the function declarations and
definitions. To use it in the project, `#include` the header at will
in any header or source file in the project, but in one source file,
and only in that one source file, `#define` the macro
`RYML_SINGLE_HDR_DEFINE_NOW` **before including the header**. This
will enable the function definitions. For example:
```c++
// foo.h
#include <ryml_all.hpp>

// foo.cpp
// ensure that foo.h is not included before this define!
#define RYML_SINGLE_HDR_DEFINE_NOW
#include <ryml_all.hpp>
```

If you wish to package the single header into a shared library, then
you will need to define the preprocessor symbol `RYML_SHARED` during
compilation.


### As a library
The single header file is a good approach to quickly try the library,
but if you wish to make good use of CMake and its tooling ecosystem,
(and get better compile times), then ryml has you covered.

As with any other cmake library, you have the option to integrate ryml into
your project's build setup, thereby building ryml together with your
project, or -- prior to configuring your project -- you can have ryml
installed either manually or through package managers.

Currently [cmake](https://cmake.org/) is required to build ryml; we
recommend a recent cmake version, at least 3.13.

Note that ryml uses submodules. Take care to use the `--recursive` flag
when cloning the repo, to ensure ryml's submodules are checked out as well:
```bash
git clone --recursive https://github.com/biojppm/rapidyaml
```
If you omit `--recursive`, after cloning you
will have to do `git submodule update --init --recursive`
to ensure ryml's submodules are checked out.

### Quickstart samples

These samples show different ways of getting ryml into your application. All the
samples use [the same quickstart executable
source](./samples/quickstart.cpp), but are built in different ways,
showing several alternatives to integrate ryml into your project. We
also encourage you to refer to the [quickstart source](./samples/quickstart.cpp) itself, which
extensively covers most of the functionality that you may want out of
ryml.

Each sample brings a `run.sh` script with the sequence of commands
required to successfully build and run the application (this is a bash
script and runs in Linux and MacOS, but it is also possible to run in
Windows via Git Bash or the WSL). Click on the links below to find out
more about each sample:

| Sample name        | ryml is part of build?   | cmake file   | commands     |
|:-------------------|--------------------------|:-------------|:-------------|
| [`singleheader`](./samples/singleheader) | **yes**<br>ryml brought as a single header file,<br>not as a library | [`CMakeLists.txt`](./samples/singleheader/CMakeLists.txt) | [`run.sh`](./samples/singleheader/run.sh) |
| [`singleheaderlib`](./samples/singleheaderlib) | **yes**<br>ryml brought as a library<br>but from the single header file | [`CMakeLists.txt`](./samples/singleheaderlib/CMakeLists.txt) | [`run_shared.sh` (shared library)](./samples/singleheaderlib/run_shared.sh)<br> [`run_static.sh` (static library)](./samples/singleheaderlib/run_static.sh) |
| [`add_subdirectory`](./samples/add_subdirectory) | **yes**                      | [`CMakeLists.txt`](./samples/add_subdirectory/CMakeLists.txt) | [`run.sh`](./samples/add_subdirectory/run.sh) |
| [`fetch_content`](./samples/fetch_content)      | **yes**                      | [`CMakeLists.txt`](./samples/fetch_content/CMakeLists.txt) | [`run.sh`](./samples/fetch_content/run.sh) |
| [`find_package`](./samples/find_package)        | **no**<br>needs prior install or package  | [`CMakeLists.txt`](./samples/find_package/CMakeLists.txt) | [`run.sh`](./samples/find_package/run.sh) |

### CMake build settings for ryml
The following cmake variables can be used to control the build behavior of
ryml:

  * `RYML_WITH_TAB_TOKENS=ON/OFF`. Enable/disable support for tabs as
    valid container tokens after `:` and `-`. Defaults to `OFF`,
    because this may cost up to 10% in processing time.
  * `RYML_DEFAULT_CALLBACKS=ON/OFF`. Enable/disable ryml's default
    implementation of error and allocation callbacks. Defaults to `ON`.
  * `RYML_STANDALONE=ON/OFF`. ryml uses
    [c4core](https://github.com/biojppm/c4core), a C++ library with low-level
    multi-platform utilities for C++. When `RYML_STANDALONE=ON`, c4core is
    incorporated into ryml as if it is the same library. Defaults to `ON`.

If you're developing ryml or just debugging problems with ryml itself, the
following cmake variables can be helpful:
  * `RYML_DEV=ON/OFF`: a bool variable which enables development targets such as
    unit tests, benchmarks, etc. Defaults to `OFF`.
  * `RYML_DBG=ON/OFF`: a bool variable which enables verbose prints from
    parsing code; can be useful to figure out parsing problems. Defaults to
    `OFF`.

#### Forcing ryml to use a different c4core version

ryml is strongly coupled to c4core, and this is reinforced by the fact
that c4core is a submodule of the current repo. However, it is still
possible to use a c4core version different from the one in the repo
(of course, only if there are no incompatibilities between the
versions). You can find out how to achieve this by looking at the
[`custom_c4core` sample](./samples/custom_c4core/CMakeLists.txt).


------

## Other languages

One of the aims of ryml is to provide an efficient YAML API for other
languages. JavaScript is fully available, and there is already a
cursory implementation for Python using only the low-level API. After
ironing out the general approach, other languages are likely to
follow (all of this is possible because we're using
[SWIG](http://www.swig.org/), which makes it easy to do so).

### JavaScript

A JavaScript+WebAssembly port is available, compiled through [emscripten](https://emscripten.org/).


### Python

(Note that this is a work in progress. Additions will be made and things will
be changed.) With that said, here's an example of the Python API:

```python
import ryml

# ryml cannot accept strings because it does not take ownership of the
# source buffer; only bytes or bytearrays are accepted.
src = b"{HELLO: a, foo: b, bar: c, baz: d, seq: [0, 1, 2, 3]}"

def check(tree):
    # for now, only the index-based low-level API is implemented
    assert tree.size() == 10
    assert tree.root_id() == 0
    assert tree.first_child(0) == 1
    assert tree.next_sibling(1) == 2
    assert tree.first_sibling(5) == 2
    assert tree.last_sibling(1) == 5
    # use bytes objects for queries
    assert tree.find_child(0, b"foo") == 1
    assert tree.key(1) == b"foo")
    assert tree.val(1) == b"b")
    assert tree.find_child(0, b"seq") == 5
    assert tree.is_seq(5)
    # to loop over children:
    for i, ch in enumerate(ryml.children(tree, 5)):
        assert tree.val(ch) == [b"0", b"1", b"2", b"3"][i]
    # to loop over siblings:
    for i, sib in enumerate(ryml.siblings(tree, 5)):
        assert tree.key(sib) == [b"HELLO", b"foo", b"bar", b"baz", b"seq"][i]
    # to walk over all elements
    visited = [False] * tree.size()
    for n, indentation_level in ryml.walk(tree):
        # just a dumb emitter
        left = "  " * indentation_level
        if tree.is_keyval(n):
           print("{}{}: {}".format(left, tree.key(n), tree.val(n))
        elif tree.is_val(n):
           print("- {}".format(left, tree.val(n))
        elif tree.is_keyseq(n):
           print("{}{}:".format(left, tree.key(n))
        visited[inode] = True
    assert False not in visited
    # NOTE about encoding!
    k = tree.get_key(5)
    print(k)  # '<memory at 0x7f80d5b93f48>'
    assert k == b"seq"               # ok, as expected
    assert k != "seq"                # not ok - NOTE THIS! 
    assert str(k) != "seq"           # not ok
    assert str(k, "utf8") == "seq"   # ok again

# parse immutable buffer
tree = ryml.parse_in_arena(src)
check(tree) # OK

# parse mutable buffer.
# requires bytearrays or objects offering writeable memory
mutable = bytearray(src)
tree = ryml.parse_in_place(mutable)
check(tree) # OK
```
As expected, the performance results so far are encouraging. In
a [timeit benchmark](api/python/parse_bm.py) compared
against [PyYaml](https://pyyaml.org/)
and [ruamel.yaml](https://yaml.readthedocs.io/en/latest/), ryml parses
quicker by generally 100x and up to 400x:
```
+----------------------------------------+-------+----------+----------+-----------+
| style_seqs_blck_outer1000_inner100.yml | count | time(ms) | avg(ms)  | avg(MB/s) |
+----------------------------------------+-------+----------+----------+-----------+
| parse:RuamelYamlParse                  |     1 | 4564.812 | 4564.812 |     0.173 |
| parse:PyYamlParse                      |     1 | 2815.426 | 2815.426 |     0.280 |
| parse:RymlParseInArena                 |    38 |  588.024 |   15.474 |    50.988 |
| parse:RymlParseInArenaReuse            |    38 |  466.997 |   12.289 |    64.202 |
| parse:RymlParseInPlace                 |    38 |  579.770 |   15.257 |    51.714 |
| parse:RymlParseInPlaceReuse            |    38 |  462.932 |   12.182 |    64.765 |
+----------------------------------------+-------+----------+----------+-----------+
```
(Note that the parse timings above are somewhat biased towards ryml, because
it does not perform any type conversions in Python-land: return types
are merely `memoryviews` to the source buffer, possibly copied to the tree's
arena).

As for emitting, the improvement can be as high as 3000x:
```
+----------------------------------------+-------+-----------+-----------+-----------+
| style_maps_blck_outer1000_inner100.yml | count |  time(ms) |  avg(ms)  | avg(MB/s) |
+----------------------------------------+-------+-----------+-----------+-----------+
| emit_yaml:RuamelYamlEmit               |     1 | 18149.288 | 18149.288 |     0.054 |
| emit_yaml:PyYamlEmit                   |     1 |  2683.380 |  2683.380 |     0.365 |
| emit_yaml:RymlEmitToNewBuffer          |    88 |   861.726 |     9.792 |    99.976 |
| emit_yaml:RymlEmitReuse                |    88 |   437.931 |     4.976 |   196.725 |
+----------------------------------------+-------+-----------+-----------+-----------+
```


------

## YAML standard conformance

ryml is close to feature complete. Most of the YAML features are well
covered in the unit tests, and expected to work, unless in the
exceptions noted below.

Of course, there are many dark corners in YAML, and there certainly
can appear cases which ryml fails to parse. Your [bug reports or pull
requests](https://github.com/biojppm/rapidyaml/issues) are very
welcome.

See also [the roadmap](./ROADMAP.md) for a list of future work.


### Known limitations

ryml deliberately makes no effort to follow the standard in the
following situations:

* Containers are not accepted as mapping keys: keys must be scalars.
* Tab characters after `:` and `-` are not accepted tokens, unless
  ryml is compiled with the macro `RYML_WITH_TAB_TOKENS`. This
  requirement exists because checking for tabs introduces branching
  into the parser's hot code and in some cases costs as much as 10%
  in parsing time.
* Anchor names must not end with a terminating colon: eg `&anchor: key: val`.
* Non-unique map keys are allowed. Enforcing key uniqueness in the
  parser or in the tree would cause log-linear parsing complexity (for
  root children on a mostly flat tree), and would increase code size
  through added structural, logical and cyclomatic complexity. So
  enforcing uniqueness in the parser would hurt users who may not care
  about it (they may not care either because non-uniqueness is OK for
  their use case, or because it is impossible to occur). On the other
  hand, any user who requires uniqueness can easily enforce it by
  doing a post-parse walk through the tree. So choosing to not enforce
  key uniqueness adheres to the spirit of "don't pay for what you
  don't use".
* `%YAML` directives have no effect and are ignored.
* `%TAG` directives are limited to a default maximum of 4 instances
  per `Tree`. To increase this maximum, define the preprocessor symbol
  `RYML_MAX_TAG_DIRECTIVES` to a suitable value. This arbitrary limit
  reflects the usual practice of having at most 1 or 2 tag directives;
  also, be aware that this feature is under consideration for removal
  in YAML 1.3.

Also, ryml tends to be on the permissive side where the YAML standard
dictates there should be an error; in many of these cases, ryml will
tolerate the input. This may be good or bad, but in any case is being
improved on (meaning ryml will grow progressively less tolerant of
YAML errors in the coming releases). So we strongly suggest to stay
away from those dark corners of YAML which are generally a source of
problems, which is a good practice anyway.

If you do run into trouble and would like to investigate conformance
of your YAML code, beware of existing online YAML linters, many of
which are not fully conformant; instead, try using
[https://play.yaml.io](https://play.yaml.io), an amazing tool which
lets you dynamically input your YAML and continuously see the results
from all the existing parsers (kudos to @ingydotnet and the people
from the YAML test suite). And of course, if you detect anything wrong
with ryml, please [open an
issue](https://github.com/biojppm/rapidyaml/issues) so that we can
improve.


### Test suite status

As part of its CI testing, ryml uses the [YAML test
suite](https://github.com/yaml/yaml-test-suite). This is an extensive
set of reference cases covering the full YAML spec. Each of these
cases have several subparts:
 * `in-yaml`: mildly, plainly or extremely difficult-to-parse YAML
 * `in-json`: equivalent JSON (where possible/meaningful)
 * `out-yaml`: equivalent standard YAML
 * `emit-yaml`: equivalent standard YAML
 * `events`: reference results (ie, expected tree)

When testing, ryml parses each of the 4 yaml/json parts, then emits
the parsed tree, then parses the emitted result and verifies that
emission is idempotent, ie that the emitted result is semantically the
same as its input without any loss of information. To ensure
consistency, this happens over four levels of parse/emission
pairs. And to ensure correctness, each of the stages is compared
against the `events` spec from the test, which constitutes the
reference. The tests also check for equality between the reference
events in the test case and the events emitted by ryml from the data
tree parsed from the test case input. All of this is then carried out
combining several variations: both unix `\n` vs windows `\r\n` line
endings, emitting to string, file or streams, which results in ~250
tests per case part. With multiple parts per case and ~400 reference
cases in the test suite, this makes over several hundred thousand
individual tests to which ryml is subjected, which are added to the
unit tests in ryml, which also employ the same extensive
combinatorial approach.

Also, note that in [their own words](http://matrix.yaml.io/), the
tests from the YAML test suite *contain a lot of edge cases that don't
play such an important role in real world examples*. And yet, despite
the extreme focus of the test suite, currently ryml only fails a minor
fraction of the test cases, mostly related with the deliberate
limitations noted above. Other than those limitations, by far the main
issue with ryml is that several standard-mandated parse errors fail to
materialize. For the up-to-date list of ryml failures in the
test-suite, refer to the [list of known
exceptions](test/test_suite/test_suite_parts.cpp) from ryml's test
suite runner, which is used as part of ryml's CI process.


------

## Alternative libraries

Why this library? Because none of the existing libraries was quite
what I wanted. When I started this project in 2018, I was aware of these two
alternative C/C++ libraries:

  * [libyaml](https://github.com/yaml/libyaml). This is a bare C
    library. It does not create a representation of the data tree, so
    I don't see it as practical. My initial idea was to wrap parsing
    and emitting around libyaml's convenient event handling, but to my
    surprise I found out it makes heavy use of allocations and string
    duplications when parsing. I briefly pondered on sending PRs to
    reduce these allocation needs, but not having a permanent tree to
    store the parsed data was too much of a downside.
  * [yaml-cpp](https://github.com/jbeder/yaml-cpp). This library may
    be full of functionality, but is heavy on the use of
    node-pointer-based structures like `std::map`, allocations, string
    copies, polymorphism and slow C++ stream serializations. This is
    generally a sure way of making your code slower, and strong
    evidence of this can be seen in the benchmark results above.

Recently [libfyaml](https://github.com/pantoniou/libfyaml)
appeared. This is a newer C library, fully conformant to the YAML
standard with an amazing 100% success in the test suite; it also offers
the tree as a data structure. As a downside, it does not work in
Windows, and it is also multiple times slower parsing and emitting.

When performance and low latency are important, using contiguous
structures for better cache behavior and to prevent the library from
trampling caches, parsing in place and using non-owning strings is of
central importance. Hence this Rapid YAML library which, with minimal
compromise, bridges the gap from efficiency to usability. This library
takes inspiration from
[RapidJSON](https://github.com/Tencent/rapidjson) and
[RapidXML](http://rapidxml.sourceforge.net/).

------
## License

ryml is permissively licensed under the [MIT license](LICENSE.txt).