Writing Formats

Last week I talked about technical writing as a role, and I figured I’d run with that theme for a minute. One of the things I mentioned was that we tend to write in other formats, and I wanted to expand on that – if you’re used to a “What You See is What You Get” (WYSIWYG) system like Microsoft Word or a Rich Text Format (RTF), or online editors, you might not think about writing formats much – if you want something bold, you click the bold button. The format is how that bold is represented under the hood. (This is related to, but slightly different from a file type – you can have multiple file types that actually use the same format behind the scenes, but are saved as a different type of file for various reasons.)

For example, if you take a sentence of “This sentence is daring and bold.”, in markdown that would look like:

This sentence is daring and **bold**.

In asciidoc, it would look like:

This sentence is daring and *bold*.

In docbook, it would look like:

<para>This sentence is daring and <emphasis role="bold">bold</emphasis>.</para>

In HTML, it would look like:

<p>This sentence is daring and <strong>bold</strong>.</p>

In DITA, it would look like:

<p>This sentence is daring and <strong>bold</strong>.</p>

From the above examples, you can see that they’re all basically achieving the same goal, but with slightly different syntax. So why so many formats?

Well, some of it is definitely “not invented here” syndrome, where some pedant twenty years ago decided an existing format wasn’t quite what they wanted for their particular needs, and decided to make their own format instead that would solve everything for everyone. (That xkcd comic is real.)

Sometimes the needs really are different, though. Case in point: markdown was created by John Gruber back in 2004, and was created because he wanted a way to mark up some text in a way that it was still readable as text and where the intent of the markup was still clear, but could also be used to render that markup into something more visual. (I’m paraphrasing, but that’s the gist as I understand it.) It built on existing styling that had developed organically in the early days of the internet, when plain text was frequently the only option. It was also somewhat limited for uses outside the most common styling, as he wrote it primarily to serve his own needs (no shade, I can’t imagine he’d expected it to become quite such a widespread de facto standard). So given the constraints of what he was trying to achieve, using HTML or some other more semantically verbose format wasn’t really an option. Because people had different needs, there’s actually a lot of different “flavors” of markdown that extend the core definitions Gruber laid out to help address some of those constraints, but generally, they all at least support that core. (A few notable ones: GitHub Flavored Markdown, MultiMarkdown, and CommonMark.)

Asciidoc actually predates markdown (by about two years), but the rationale for making a new format is similar. Originally, it was modeled on docbook, which is a very verbose, semantically complex language, where there was a desire for something that had similar tools for writers, but was more human readable. At this point it’s got a life of its own (it’s in the process of becoming an official specification right now, in fact), but that original goal of equivalence with docbook means it has a lot of tools that are useful for technical writers and documentarians that markdown doesn’t. (The tradeoff is that more tools in the toolbox means more complexity, which I think scared a lot of folks off from broader adoption. Also, at the moment there’s basically just one implementation that’s been transpiled to a few other languages, hence the push for making an official specification – conversely, there are dozens of markdown processors for basically every language.)

Which leads me to docbook and why that became a thing: basically, O’Reilly (a publisher) and HAL Computer Systems (a late 80s, early 90s computer manufacturer) wanted a format that codified a lot of the common design/layout language needed for writing documentation and technical books. So, things like different types of admonitions and callouts, different types of lists, representing code samples, et cetera. It quickly grew as it gained adoption, and at this point the schema has hundreds of tags (which I suppose isn’t too surprising for a 35 year old format). Dita is in a similar boat – it’s been around for 20 years, and is a semantically complex language built around the needs of the industries who helped create it, and quickly grew to the point where the schema for it is actually broken into a base schema, and then specializations to extend it depending on which needs you actually have.

Pardon the rabbit-holing. The point is that there are reasons why a lot of these formats were created, and they serve different needs – it’s not like one format is the perfect solution for all cases or all people, and that’s okay.

One thing to note about all of these formats, though: they are all methods to mark up text, to assign some sort of semantic value. It’s all about the content – while some may be more human-readable than others, the general point of the format is to transform that content into something else. It’s about separating the content from the presentation. There’s a lot of value to this: it means you can write things once, and then build that into any number of presentational formats – HTML, PDF, ePUB, man pages, whatever you need.

Technically, even WYSIWYG formats like Microsoft Word are using a markup behind the scenes (try opening a Word doc in a text editor). The difference is that Word decides what markup to use or not use, and uses a proprietary format that means you can only export it to the formats they choose to support. (Also, ever fiddle with an image in Word, move it one pixel over and suddenly your entire document is reflowed? That doesn’t happen when you control the markup.)

If you wanted to get into technical writing (or just up your own documentation game), I’d definitely encourage learning some form of markup. Which one is sort of up to you and what you plan to do with it. Like, if you’re a python developer, it might make sense to learn reST (reStructuredText), since that seems to be the standard most of that community has landed on, and is even baked into their language to some degree (see PEP-287). Markdown probably has the broadest adoption and is easiest to learn, but is also the most limited (without falling back to HTML or supersets like MDX, markdoc, etc). But they’ll all give you a leg up on controlling your content and writing better documentation.

Writing Formats

Related

Leave a Reply Cancel reply