HTMLBook Document Model

January 18, 2019

Page content

HTMLbook is the core DOM of our documents. Background for this decision can be found in this Conference Paper and an O’Reilly Radar Page.

From the O’Reilly page:

“In this paper, I argue that HTML5 offers unique advantages to authors and publishers in comparison to both traditional word processing and desktop publishing tools like Microsoft Word and Adobe InDesign, as well as other markup vocabularies like DocBook and AsciiDoc. I also consider the drawbacks currently inherent in the HTML5 standard with respect to representing long-form, structured text content, and the challenges O’Reilly has faced in adopting the standard as the new source format for its toolchain. Finally, I discuss how O’Reilly has surmounted these challenges by developing HTMLBook, a new open, HTML5-based XML standard expressly designed for the authoring and production of both print and digital book content.”

For an in-depth description of HTMLbook please refer to the O’Reilly HTMLBook Github-page.

Processing HTML5 and CSS

If it is good enough for O’Reilly, it is good enough for me—and possibly for you, too. But with HTML5 as the core document format there are a number of challenges to overcome to make for a pleasant authoring and publishing experience.And TySE should be all about having fun when writing, shouldn’t it?

HTML is hard to write by hand. How do we support a good authoring experience?
CSS-styling may become tedious for fancy documents, especially if one prepares for e-publishing and print.
HTML/CSS must include micro-typographic hints for any output format.
The engine must be able to select page templates without deviating too much from the HTML standard.
The conventional DOM API is for JavaScript (in a web browser). How do we support extensibility and adaptability with an adequate API?

Let us take a closer look at how to address some of these concerns. Others, like the topic of an API, will have to wait for another article at another time.

Authoring

There is a multitude of online editors around for authoring HTML5. Each of them is different and none came out as a clear winner yet. That is a hint that up to now no one got it perfectly right.

Personally I am a big fan of simple input formats. I spent a long period in my life hacking in cryptic typesetting commands starting with backslashes and including all kinds of weired brackets.Just the fact that you get used to it does not mean it is pleasant. As I mentioned previously: I love plainTeX and ConTeXt, but I draw the line on LaTeX’s nanny approach (especially as I usually type on a German keyboard layout, thus making it awkward to type ‘', ‘[’ and ‘{'). Nowadays I try to do as much as I can with Markdown and other simple-text formats. Markdown makes it easy to author conventionally structured text, especially in technical realms. Nevertheless, Markdown has a lot of shortcomings when it comes to typesetting, where details matter a lot.

At the heart of this discussion lies the followig dilemma: Human beings prefer to recognize building blocks of texts as rough visual patterns, whereas a typesetting machine needs very detailed and unambiguous commands about the structure of a document. Humans want leeway, computers are nit-pickers. The good news is, we can demand from the machine to help us bridge this chasm.

Authoring Chasm: from visual semantic clues to renderable structure

The first task is to find clues in the user’s input text for an exact semantic structure. Is this meant to be a chapter? Does this illustration belong to this piece of text? Is this a floating point number or an IP-address? This is where all the semantic tags of HTML5 come into play: article, section, figure, etc. The user has—one way or the other—to make a statement about the semantic context of his text-fragments. The better the machine understands the intention of a text-fragment, the easier it can layout and style it appropriately. In TySE we will use HTMLbook as the central semantic represenation of documents. The markup should be complete down to the most detailed <span> elements.

Wether a markup (or the resulting structure) are really “semantic” is a matter of definition. Microsoft Word allows you to define paragraph sytles named something like “Chapter-Heading”, which clearly have a semantic intent, and “Large Bold Text”, which does not. You can create nice-looking printouts with both. But consider the case when someone (you!) has to convert the text to a web-page or to a digital book. It is fairly clear what to do with “Chapter-Heading”, but what about “Large Bold Text”? Systems like LaTeX/ConTeXt offer semantic markup, which shields the user from the complexities of the underlying processing instructions, but the fact that the TeX-engine they’re based on is a pure formatting-instruction software cannot always been hidden. DocBook tries even harder, but the problem persists—and this holds for TySE, too.

The trade-off may be put like this: The computer needs a detailed structural representation of a document. People are no good at providing this (even though users of LaTeX or MS Word are going a long way of trying), so the machine better infers as much structure as possible from visual clues. And authors better respect that machines (and other people) will not be able to read their mind, so authors are required to clarify their intent in some cases. TySE strives to meet authors in their halve of the playing field by accepting Markdown input in a very lenient way.

Styling and Layout

HTMLbook (and HTML5 in general) separates structure from visual appearance by externalizing it to CSS. The idea behind CSS is to style things in a declarative way, as opposed to, e.g., PostScript, which styles things by describing drawing prodecures. The declarative approach is often easier to apply for non-specialists, but has its downsides, too. One of those is the danger of the number of declarative “styles” to ever-expand—which is exactly what’s happending with CSS.

descriptive		procedural
I have a square with rounded corners. It is 100 pixels wide and tall. The corners are of radius 20. The line color is red and the fill color is a light greenish-blue.		Pick a red pen with a 2 pixel wide nib. Start at point (0,80) of the canvas. Draw a straight line to (0,20), then curve in a quartercicle to (20,0). Another straight line to (80,0) and another curve to (100,20). Repeat this to (100,80) and (80,100) and for (20,100) and (0,80). Close the path and fill the interior with an RGB value of CCDAE1 hex.

It is not incidental that the descriptive notation in this example is shorter—it generally is. But all of a sudden someone takes a bite out of your square and leaves you to describe this figure:

Most people would retreat to a procedural notation, i.e. a recipe of how to draw to get a result like this. Clinging to a descriptive notation would probably involve the invention of funny names for this kind of outline. And that accurately reflects the trade-off between simplicity for standard cases versus flexibility for non-standard ones.

But that’s not the only trade-off to consider when dealing with layout and design. HTMLbook is all about the semantics of a book’s content: authors and publishers define elements like chapters, figures, table of content etc. HTMLbook does not define how these elements are supposed to be displayed, i.e. does not say anything about layout or design. This is left to other technologies, one of which may be CSS. Let’s leave the topic of styling aside for now and focus on layout.

Bringing semantics and layout together to create pages

Suppose your document has three semantic elements: a body text (2), a kind of aside with some annotations (1) and a figure (3). As an author, let us assume you totally trust your designer and do not (currenty) care about the visual appearance of your semantic blocks. Your only concern is: the reader must have easy access to (1) and (3) when arriving at their reference points in (2).

A versed layouter has a good sense for how to place semantic entities to support the reader in understanding the document. Usually there are a whole lot of constraints to consider when deciding on layout, especially if output media are manifold. But one thing has not changed very much over the centuries when talking about design and layout: it is about the placement of boxes (or frames or whatever you want to call them). TeX and InDesign are all about boxes, HTML and CSS as well. We will have to create boxes somehow to do layout.Other concepts for layout have been tried, but so far have not been successful. Human brains seem badly suited for thinking in, e.g., constraints. And one should never forget: humans want to fiddle with visual details—it is in the genes of anyone having to do with design.

HTML/CSS lets you create boxes by the following means:

The user agent (browser) follows an algorithm for generating boxes for HTML elements (plus some anonymous boxes).
CSS may be used to alter the boxes and to generate some new ones (based on the existing ones).

From a designer’s point of view this is bad news: the creation of boxes—the things she most urgently cares about—is outsourced to an opaque machine. She is only allowed to do damage control afterwards, by writing brilliant CSS code to re-position the boxes. InDesign allows her to do the exact opposite: think about boxes first and then fill in the textual content like a liquid.TeX users are no strangers to filling boxes either, creating things like \hbox{…} and \vbox{…} And that is the trade-off here: maximizing designers’ control over the creation and placement of boxes versus keeping a strict separation of semantics and layout. HTMLbook emphasizes the latter, leaving it to systems like TySE to invent strategies for the former.

Page Templates

Up to now, when we talked about layout, oftentimes we were thinking about page layout. After all, with TySE we are concerned with creating documents. But of course print on paper is not the only output device we are targeting. Ideally our document should be suited to be published on the web as well. In the best case we would be able to maintain a single semantic content source (HTMLbook) and a single layout + design source (CSS) to handle all possible output media. In real life, however, that will be next to impossible. There is a fundamental difference between layout on unconstrained media and paged media. The web community is in the midst of a vidid discussion about the future of web layout. The ongoing argument on the feature of CSS-regions and (the lack of) their browser support is a symptome for the aforementioned (unresolved) trade-offs, as well as for the different stakes large companies like Google and Adobe have in web technology. TySE will not be too opinionated, but rather focus on creating beautiful documents now.

However, there is one “design principle” TySE will live by: Machines are too dumb to layout printed documents autonomously—they need help by human beings. That may change in the foreseeable future, but as I just said: we are concerned with the present.And you remember the other theorem? Humans want to fiddle! To ease the interaction between human and machine for layout, i.e. the placing of boxes, there is a very effective tried and tested means: page templates. They are a very important feature of TySE and I will talk about them in depth in another article.