ArchieML (or "AML") was created at The New York Times to make it easier to write and edit structured text on deadline that could be rendered in web pages, or more specifically, rendered in interactive graphics.
One of the main goals was to make it easy to tag text as data, without having type a lot of special characters. Another goal was to allow the document to contain lots of notes and draft text that would not be read into the data. And finally, because we make extensive use of Google Documents's concurrent-editing features — while working on a graphic, we can have several reporters, editors and developers all pouring information into a single document — we wanted to have a format that could survive being edited by users who may never have seen ArchieML or any other markup language at all before.
ArchieML differs from other popular formats like YAML and JSON in several areas that we've found are key to making it easy to use:
Whitespace is not significant to the document structure
In YAML, lines must be indented precisely and variably; the wrong number of spaces to the left of a key invalidates the document, and tabs can't be used. AML ignores all whitespace not within a value. We believe this makes it easier for non-programmers to use, and is essential for use in environments with non-monospaced fonts, like in Google Documents.
Unstructured text is ignored; there is no such thing as a parsing error
AML was designed so that writers could work in a freeform environment. They should be able to add entire paragraphs as scratch work that do not appear in the output. JSON and YAML have strict schemas that forbid text deviating from a pattern. AML doesn't assume text follows any pattern. If it finds text that looks like data, it treats it as data. Otherwise, it moves on.
The notation makes sense to non-programmers
Lists of values are noted with bullet points / asterisks, not hyphens or quoted strings that must be separated with commas. An overriding goal was to have a intuitive format that could be passed to a non-technical user — a reporter, an assigning editor or a copy editor — to edit, and to have the format be clear enough that they could make changes without breaking the parsing of the document. If we were using another format, we'd have to explain indentation rules in YAML, or how to match curly braces or properly escape quotation marks in JSON, and so forth.
For a very simple example, here's a screenshot of the Google Doc that powers a recent graphic about the trick plays used by the New England Patriots and Seattle Seahawks:
To generate the graphic, we load the ArchieML data from the document using the archieml-js npm module, then pass it to an underscore template to render the final markup server-side. This lets the journalists who are focusing on the text and content concentrate on getting the copy in shape independently of the developers working on the graphic.
While this is a very simple example, with only a few bits of text and data and one comment at the end that is ignored, when we're covering a breaking news story, we can have a half-dozen people all contributing to a Google Doc at the same time as we gather all the information we need for a graphic and turn it into the final copy blocks that make their way into the finished piece.
Parsers and tools in (hopefully) your language of choice.
At The New York Times, we normally write ArchieML in Google Documents. Both parsers include quick-start examples for how to download text from Google Docs and run it through the parser. They also include some formatting steps we take, such as converting links to HTML tags.
Examples:
For more fully-fledged integrations with Google Docs, use one of the plugins above.
Click on any ArchieML textarea to try it out yourself, and see how changes affect the output.
Or try out ArchieML in the Sandbox.
Strings can be stored as part of key/value pairs, defined whenever a line in ArchieML begins with a token followed by a colon. Keys can contain any unicode character, with the exception of whitespace / invisible characters, and a handful of characters that are used within ArchieML ({ } [ ] : . +
). The rest of the string is the value.
Whitespace surrounding keys and values is ignored. Indent as you like. Keys are case sensitive.
Lines that don't look like keys or other special commands are ignored:
Use dot-notation to create nested objects.
You can also use "object" blocks to namespace a group of keys.
Dot notation can be used in object blocks as well:
You can close an object by beginning a line with {}
. ArchieML is parsed one line at a time, so you can also close an object by opening a new one.
Object blocks with names prepened with a period nest inside of open objects instead of ending them. Beginning a line with {}
closes a nested object and returns to the parent.
Groups of keys can be placed inside an array by giving the array a name within brackets. The name of the array can be any valid key, and can use dot-notation. You can optionally end an array with an empty set of brackets, or by beginning a new array.
All keys inside the array are inserted into a single object within the array. The parser remembers the first key it found, and whenever it encounters it again, a new object is started.
You can also create "simple" or "flat" arrays of strings. If an asterisk is encountered first within an array, that array will become a simple array, and key/value pairs within it will be ignored. If a key/value pairs is encountered first, then asterisk lines will be ignored.
Array elements can contain arrays of their own. To begin an array while inside an array element, prepend its name with a period.
Much like nested object blocks, nested arrays must be "closed" with empty brackets in order to move up to the parent level.
Freeforms are a third type of array that was created to have better control over presentation from within ArchieML.
Unlike regular arrays, which group lines into objects whose values have no order, freeforms preserve the order of each of its lines. Clients that use ArchieML's output can then use that order to render the values, allowing you to vary the presentation for each array item.
Each line becomes its own object, with a type and value. ArchieML splits these two words into separate objects to make it easier to deal with different type of information; rendering logic can always be based on the content of the type attribute.
Freeforms also allow you to type unstructured lines of text, which are included as items in the array with a type of text. Note that this means that comments do not work within freeforms.
Having full control over order is useful when arrays need to be mixed with other types of data. For example, showing a list of events interspersed with general artwork.
Values automatically end when a newline is encountered. But all subsequent text is read into a buffer that can be added to that key. Anchor the end of a multi-line value by following the value with a line beginning with ":end". All whitespace within the block is preserved.
Try removing the last line to see how it changes the output:
You can place any text inside of a multi-line value. If one of your lines would be interpreted by the parser as a key or some other special command, you may have to escape that line by adding a backslack to the beginning of it. The backslash won't be included in the value.
Try removing the backslashes from the following lines:
Wrap text between lines that begin with ":skip" and ":endskip" to ignore blocks of text.
There is also a safety mechanism of sorts built in. When the parser encounters a line beginning with ":ignore" (even if it's within a :skip block), parsing immediately stops, and the rest of the document is ignored.
If you use JavaScript or Ruby, we hope you'll try one of the existing ArchieML parsers.
If you want to make a parser yourself (or want the technical details on the format), the full specification is online here.
Questions or concerns? The Github repository for this site is available at newsdev/archieml.org, and you can use its Issues page to submit questions or bugs on the spec itself.
Created by Michael Strickland, Archie Tse, Matthew Ericson and Tom Giratikanon / The New York Times
Copyright (c) 2015 The New York Times Company