how we went from plain text file to PDF/UA

1 year ago by Packbat

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

Oh my stars, we typed </ul> instead of </ol> at the end of the table of contents, that’s why the formatting got messed up in the pdf the first time.

Sorry, lemme start again.

Hi! We wrote, like, seventeen-ish mini tabletop roleplaying games in a year and a half, and when we went to make a collection out of them, we had to deal with the consequences of three choices we have made about how we want to exist as game designers:

We really like plain text. It’s super easy to edit - the programs we use to do it load, like, instantly - and on a technical level it’s extremely basic. Almost any program that displays text can read it, and almost anyone who can read a piece of paper can read the displayed text.
We really like semantic HTML. We don’t know much CSS - our education came in the 1990s, we’re way behind the curve on things - but we’re all about em tags and strong tags and kbd tags. (All the URLs are enclosed in kbd tags, because you type them into your browser.)
We want the little accessibility tester thing in LibreOffice to say we did good and didn’t make any mistakes. More seriously, PDFs are famously awful for people with screenreaders and suchlike, so we want our PDFs to be as un-terrible in that regard as we can manage, being chumps who don’t even know how to use one.

So, we had to get from files that look like this:

== Facts Phase

Each oracle card gets one notecard briefly outlining the corresponding truth your protagonist uncovers. Revise outlines as needed throughout. Draw two protagonist cards as bookends - Who They Are and Who They Thought They Were - then as many additional cards as feels right to fill out a rich story. Some possible ideas:
- Someone who arrived.
 = ...who disappeared.
 = ...who stayed.
- A city/town/village they visit.
 = Why they were brought there.
- A specific location.
- A local event.
- An astronomical event.
- A memory.
- A wrong guess.

to something that looks like this: A screenshot of a page of the PDF. The section header and bullet list are there, with the two different levels of indents intact.

So, what are the steps? How do you do this? For us, it went through three stages:

Reinterpreting our plain text as HTML.
Importing the HTML into LibreOffice (you could use something else, but LibreOffice is free and cross-platform) and polishing up the formatting to fit.
Exporting from LibreOffice to a Universal Accessibility PDF.

Figuring out HTML

So, the short version is that we have the HTML Living Standard bookmarked - we type html into our location bar and it goes directly there. We have it there so we can jump there and get answers to questions like “is it still cool to use the tt tag?”. (Answer: no, they would rather you picked a tag which said why it’s teletype-style characters.) We also have the ANDI accessibility testing tool as a bookmarklet that we can open any time. But there’s some basic principles we know right out:

Headings. There should be one h1 tag near the top with the big name-of-the-document header. From there it should be a tree: each section marked with an h2, each subsection marked with an h3, each sub-sub-section marked with h4, and so on down to h6. Don’t choose header tags based on wanting specific formatting! We were wrong in the past when we used h2 for the subtitle - you’re supposed to put an hgroup around the h1 tag, put a p tag inside the hgroup for the subtitle, and then format it with some CSS so both your audience and their computers can tell that there’s only one header there and the subtitle is something else.
Navigation. If you have something like <h2 id="space-conspiracy">conspiracy in reverse</h2>, you can have a link to #space-conspiracy elsewhere and it’ll go directly to that header. (This is so much better than the nineties - we remember doing <a name="space-conspiracy"></a> just before the header tag, which is just wrong.) Then, you can have a <nav> and </nav> tag around your Table of Contents header and whatever kind of list you use for your contents, and then link to each of the ids you made up for your headings in that list. (We also included links back up to the table of contents - #space-nav - under each h2 header.)
Semantic formatting. This is the big thing we have the HTML standard bookmarked for: how do we do the inspired-by section some games have? (An aside tag, formatted with CSS.) How do we represent a filename? (We … don’t know but maybe samp, because it’s what the computer would tell you if you checked the directory listing - hence, sample of output?) Do we need table header cells? (Yes, you should have them, and use the scope= attribute to say which cells they’re the header for.) This is a big thinky part of the process, but there’s usually some way to say something close to what we mean.

This is the really long and tedious part, going line by line and writing each tag … but it’s also satisfying? When we look at:

<h3>Facts Phase</h3>

<p>Each oracle card gets one notecard briefly outlining the corresponding truth your protagonist uncovers. Revise outlines as needed throughout. Draw two protagonist cards as bookends &ndash; Who They Are and Who They Thought They Were &ndash; then as many additional cards as feels right to fill out a rich story. Some possible ideas:</p>
<ul>
<li>Someone who arrived.
<ul>
<li>&hellip;who disappeared.</li>
<li>&hellip;who stayed.</li>
</ul></li>
<li>A city/town/village they visit.
<ul>
<li>Why they were brought there.</li>
</ul></li>
<li>A specific location.</li>
<li>A local event.</li>
<li>An astronomical event.</li>
<li>A memory.</li>
<li>A wrong guess.</li>
</ul>

…we can see all the little decisions we made, like enclosing the list for the sub-bullet points inside the list-item tag for the main bullet point. Nobody’s going to look at this, probably, but if they did, they can see how we thought of the material we wrote, conceptually. And that makes us really proud.

And also it means that, if some rando we never heard of doing a project we never thought of decides to care about this and writes a program that interprets it, it will be right. And that’s how this kind of accessibility works, as far as we can tell: we use the standard to give truthful information, so a machine which knows nothing but the standard will have the truthful information it needs.

Right, so, that was the hard step. Next, we have to make the PDF.

Importing into LibreOffice

LibreOffice can just open HTML files.

…but it doesn’t quite understand them the way HTML works outside LibreOffice. So: make a new copy, start editing.

LibreOffice separates titles and subtitles into their own space, independent of headers. So, delete the h1 tag, turn it into a p or something, and then go through each other header and bump them up a level - find-and-replace h2 to h1, then h3 to h2, and so on if you have a so on.
LibreOffice has its own table of contents system that generates based on headers and adds page numbers. So, delete your table of contents. Also, probably delete all the “Back to contents” links - those don’t make sense in a printout, and a PDF is built for printing out.

Right. Now open that in LibreOffice.

At this point, most of the styling you want is there. strong tags are bold … or actually, no, they’re not. strong tags are Strong Emphasis style, and likewise for other tags LibreOffice understands. And that’s kind of the point we want to highlight: paragraph and character styles are the semantic markup of LibreOffice. Both within the program - that’s how Table of Contents works, it looks for however many levels of header tag we tell it to - and outside of it - that’s how it makes those (hopefully-)accessible PDFs. So if we need to add some formatting, we do some mixture of marking text with styles and editing the styles. But most of what we want is there, because LibreOffice read it out of the HTML, and a lot of things (e.g. starting each game on a new page) are easy because we have the structure of the HTML to use (e.g. by saying the “Heading 1” style should have a page break before it).

Oh, speaking of headings, don’t forget to turn your title and subtitle into the Title and Subtitle paragraph styles.

Exporting to PDF

This is the easiest part.

You make it look good.
You go Tools/Accessibility Check and fix any problems it highlights. (Did you know footnotes are discouraged? That’s why all the URLs we included so they’d be there in the printout are inline.)
File/Export/Export as PDF, click “Universal Accessibility (PDF/UA)”, and go.

Are these the prettiest PDFs you’ve ever seen? …probably not, but that’s not our goal. We just want folks to be able to read it.

And if you have to make a PDF, and your priorities are like ours, maybe this will help you do it.

But, like, don’t mess up your tags. LibreOffice interprets misformatted HTML differently than Firefox, so when we did a </ul> instead of </ol> or <kbd> instead of </kbd>, the damage was contained in Firefox and spilled out over the entire rest of the document in LibreOffice. The Ctrl+U View Source page in Firefox does highlight broken syntax, for what it’s worth - we should probably have checked that before we hit upload.

Get all the space you need

Download NowName your own price

all the space you need

a collection of 200 word analogue roleplaying games by the Packbats

Add Ttrpg Compilation To Collection

Status	Released
Category	Physical game
Author	Packbat
Tags	lyric-game, micro-rpg, No AI, storygame, Tabletop role-playing game
Languages	English
Accessibility	High-contrast