XMQ - a new language for xml/html (+json)

by Fredrik Öhrström (last updated 2024-05-03 21:51) oehrstroem@gmail.com

Sometimes it seems like new developers are gravitating towards JSON and other key=value based languages for data storage, leaving XML behind. This is somewhat understandable since the visual appearance of XML (when used for data storage) can be verbose, to say the least. The discombobulated handling of whitespace does not help either. It would be sad to see XML fade away simply because of its visual appearance.

But what if there is a way to render XML as a key value language and fix the whitespace at the same time? This would be the same XML as before, just view/edited in a different way. XMQ is such a language, it is XML just with Q instead of L.

The XMQ format is easier for a human to read and write than XML/HTML yet it captures exactly the XML/HTML content. It can always be safely pretty printed without introducing significant whitespace. The file-suffix HTMQ is used when working with HTML in the XMQ format. There is even a reasonable mapping between JSON and XMQ.

We can use XML for markup of texts. We can use XMQ for data storage. We can always convert between them since they are in fact the same thing.

What does it look like? Click on the examples:
[this page as htmq] [pom] [rss] [svg] [mame cart driver] [android layout] [jabber] [docx] [odt] [json] [xsd schema definition] [xslt transform] [soap response] [java pojo jackson] [saml idp metadata] [saml authn_response] [WIPO ST.26_sequence listing xmq] [Imagemagick thresholds xmq]

Upload your own xml/html/json file and render it as xmq on-line:

shiporder {
    id   = 889923
    type = container
    shipto(sailing = '')
    {
        address = 'The Vasa Museum
                   Galärvarvsvägen 14
                   115 21 Stockholm
                   Sweden'
        // Remember to verify coord.
        coord = '''59°19'41.0"N 18°05'29.0"E'''
    }
    rules
}

<shiporder>
  <id>889923</id>
  <type>container</type>
  <shipto sailing="">
    <address>The Vasa Museum
Galärvarvsvägen 14
115 21 Stockholm
Sweden</address>
    <!-- Remember to verify coord. -->
    <coord>59°19'41.0"N 18°05'29.0"E</coord>
  </shipto>
  <rules/>
</shiporder>

You can use the standalone xmq tool to convert between xmq/xml/htmq/html/json or include xmq.h/xmq.c to use xmq directly from your program or in the future link to a prebuilt libxmq. With the xmq tool you can syntax highlight and pretty print XMQ (and thus indirectly XML and HTML). You can convert between XMQ/HTMQ and XML/HTML/JSON, you can apply XSLT transformations, replace entities with other XMQ/XML/JSON or quoted content, write raw text output and more...

[Download] [Github] [Grammar] [Forum/Discussions]

########## Pretty print as xmq using colors on terminal
xmq pom.xml
xmq data.json
xmq index.html

########## Convert to xmq and store in file
xmq pom.xml > pom.xmq
xmq data.json > data.xmq

########## Use the built in pager.
xmq pom.xml pa
cat data.json | xmq pa
xmq index.html delete //script delete //style pa
curl -s 'https://dummyjson.com/todos?limit=20'  | xmq pa

########## View in your default browser.
xmq docbook.xml br
xmq data.json br
curl https://slashdot.org | xmq delete //script delete //style br

########## Convert xmq to xml,html or json
xmq pom.xmq to-xml > pom.xml
xmq data.xmq to-json > data.json
xmq index.htmq to-html > index.html

Background

XML can be human readable/editable if it is used for markup of longer human language texts, ie books, articles and other documents etc. In these cases the xml-tags represent a minor part of the whole xml-file.

However XML is often used for data storage and configuration files (eg pom.xml). In such files the xml-tags represent a major part of the whole xml-file. This makes the data storage and config files hard to read and edit directly by hand. Today, the tags are a major part of html files as well, which is one reason why html files are hard to read and edit.

The other reason why XML/HTML is hard to edit by hand is because they have a complicated approach to dealing with significant whitespace. The truth is that you cannot always pretty print the XML/HTML code as you would like since it might introduce significant white space. In fact proper pretty printing even requires a full understanding of the DTD/XSD/CSS!

Solution

XMQ solves the verbosity of tags by using braces to avoid closing xml-tags and parentheses to surround the attributes. XMQ solves the whitespace confusion by requiring all intended whitespace to be quoted. There are a lot of details, of course, but let us begin with an example.

Download/build the xmq executable for your platform, then download: shiporder.xmq shiporder.xml (Note that the xml is manually pretty printed.)

To convert the XML to XMQ and pretty print with colors:
xmq shiporder.xml

Use the page command if the source file is large:
xmq shiporder.xml page
Press q or esc to exit pager. You can shorten the command to just pa.

Use the browse command to start your default browser to view the content:
xmq shiporder.xml browse
You can shorten the command to just br. Set the environement variable XMQ_THEME=darkbg or XMQ_THEME=lightbg to background color or XMQ_THEME=mono for no color.

Use shell redirection to store the XMQ output in a file:
xmq shiporder.xml to-xmq > test.xmq
(The to-xmq command is the default command and can be left out here.)

To convert the XMQ to XML:
xmq shiporder.xmq to-xml > test.xml

shiporder {
    id   = 889923
    type = container
    shipto(sailing = '')
    {
        address = 'The Vasa Museum
                   Galärvarvsvägen 14
                   115 21 Stockholm
                   Sweden'
        // Remember to verify coord.
        coord = '''59°19'41.0"N 18°05'29.0"E'''
    }
    rules
}

<shiporder>
  <id>889923</id>
  <type>container</type>
  <shipto sailing="">
    <address>The Vasa Museum
Galärvarvsvägen 14
115 21 Stockholm
Sweden</address>
    <!-- Remember to verify coord. -->
    <coord>59°19'41.0"N 18°05'29.0"E</coord>
  </shipto>
  <rules/>
</shiporder>

The hierarchical style should look familiar, but note:

XMQ files are always UTF8 encoded.
Safe values after = can be stored as plain text (see 889923 container), no quoting needed!
Unsafe values (after =) with newlines, whitespace or ( ) { } ' " or leading = & // /* must be quoted.
Two single quotes always mean the empty string (see sailing).
In multiline quotes, the incidental indentation is removed (see address).
Quotes containing quotes are quoted using n+1 single quotes (see coord). Note that two quotes are reserved for the empty string. You will therefore see a single quote ' or three quotes ''' or more quotes.
Single line comments use // and multi line comments use /* */.
Comments containing comments are commented using n+1 slashes (eg ///* *///).

This means that you can quote any block of text (except invisible spaces near newlines) with enough single quotes and you can comment any block of text with enough slashes.

The incidental indentation removal and n+1 quotes ideas originated in the expert group for JEP 378 Java Text blocks and was a collaborative effort led by Jim Laskey and Brian Goetz. The seed to the idea to separate desired whitespace from incidental(accidental) was planted by Kevin Bourrillion and the idea for n+1 quotes came from John Rose.

XMQ as a configuration file

XMQ permits multiple root nodes which means that if you use XMQ as your software config file format then the first iteration of your config file can be as simple as this: config.xmq

server = 192.0.2.62
port   = 143
file   = payroll.dat
cron   = '0 0 * * MON-FRI'

Every xmq file can be printed in compact form on a single line where whitespace between tokens is minimized.

server=192.0.2.62 port=143 file=payroll.dat cron='0 0 * * MON-FRI'

The only permitted whitespace between tokens are space (ascii 32) and newlines (ascii 10 or 13 or 13 10). All other whitespace (including tabs) must be quoted. Let us take a look at shiporder in compact form. (Any linebreaks below are due to your browser.)

shiporder{id=889923 type=container shipto(sailing=''){address=('The Vasa Museum'&#10;'Galärvarvsvägen 14'&#10;'115 21 Stockholm'&#10;'Sweden')/*Remember to verify coord.*/coord='''59°19'41.0"N 18°05'29.0"E'''}rules}

You can see character entities like 
 for newlines and compound values like address=('...'
'...') which normally is a multiline quote but where escaped newlines are intermingled with quotes to create the compact form.

You can read the config file using a simple C api. (You can also use the full libxml2 api if you like and future programming languages APIs will be coming.)

XMQDoc *doc = xmqNewDoc();
ok = xmqParseFile(doc, "config.xmq, "myconf"); assert(ok);
server = xmqGetString(doc, NULL, "/myconf/server");
port = xmqGetInt(doc, NULL, "/myconf/port");
file = xmqGetString(doc, NULL, "/myconf/file");
cron = xmqGetString(doc, NULL, "/myconf/cron");

As you can see, XMQ can be trivial, which is nice for your first config file, but when your program grows in complexity, so can your config file. You do not have to convert your config file to xml, but if you want to then you can supply your implicit root (in this case myconf):
xmq --root=myconf config.xmq to-xml > config.xml

<?xml version="1.0" encoding="utf8"?>
<myconf><server>192.0.2.62</server><port>143</port><file>payroll.dat</file><cron>0 0 * * MON-FRI</cron></myconf>

Web pages and whitespace

Now let us try some htmq/html: welcome_traveller.htmq welcome_traveller.html (Note that the html is manually pretty printed.)

xmq welcome_traveller.htmq pager

!DOCTYPE = html
html {
    body {
        h1 = Welcome!
        'Rest here weary traveller and s'
        a(href = https://a.b.c) = lee
        'p until morning.
         Say '
        &nabla;
    }
}

<!DOCTYPE html>
<html>
    <body>
        <h1>Welcome!</h1>
        Rest here weary traveller and s<a
        href="https://a.b.c">lee</a>p
        until morning.
        Say
        &nabla;
    </body>
</html>

Text that does not immediately follow an equal sign = is called a standalone quote (see 'Rest here ...' and 'p until ...') and must always be quoted. If you do not quote them, they will be interpreted as elements (see html body a h1).
XMQ pretty printing is straightforward whereas the html line breaks are weird to prevent spaces inside the word sleep.
XMQ entities like ∇ (∇) must be outside of the quotes.
In the xmq it is obvious that there is exactly a single space between Say and the nabla.

If you convert from htmq to html:
xmq welcome_traveller.htmq to-html
Then you will see that xmq does not pretty print since it wants to preserve the xmq whitespace exactly as it was written. (Any linebreaks below are due to your browser.) Since you know exactly what whitespace you are feeding the browser (html) and other tools (xml) it will be easier to control their behaviour.

<!DOCTYPE html>
<html><body>
<h1>Welcome!</h1>Rest here weary traveller and s<a href="https://a.b.c">lee</a>p until morning.
Say &nabla;</body></html>

If you convert from the original manually pretty printed html above to htmq:xmq welcome_traveller.html to-htmq then you will see that the tool xmq by default trims whitespace using its own heuristic. It keeps the original linebreaks but removes incidental indentation and leading/ending whitespace if the leader/ender contain newlines. This is the same rule xmq uses for trimming multiline quotes and comments. This heuristic usually works well but might in some situation remove significant spaces from XML sources which were not written with this in mind.

!DOCTYPE = html
html {
    body {
        h1 = Welcome!
        'Rest here weary traveller and s'
        a(href = https://a.b.c) = lee
        'p
         until morning.
         Say
         ∇'
    }
}

You can also see that the ∇ was replaced with the actual ∇. This happened because it was character entity, which is just another kind of quote. If you want to preserve all whitespace and restore the html entities then do:
xmq --trim=none welcome_traveller.html to-htmq --escape-non-7bit

!DOCTYPE = html
html {
    &#10;
    '    '
    body {
        &#10;
        '        '
        h1 = Welcome!
        &#10;
        '        '
        'Rest here weary traveller and s'
        a(href = https://a.b.c) = lee
        'p
                 until morning.
                 Say'
        &#10;
        '        '
        &nabla;
        &#10;
        '    '
    }
    &#10;
}

As you can see there is quite a lot of whitespace in xml/html, which might or might not be significant/ignorable depending on your css and other settings. If you really want this whitespace then xmq will make it obvious.

Compact XMQ with multiline comments

The opposite of xmq pretty printing is xmq compact printing with no indentation and no newlines:
xmq welcome_traveller.htmq to-htmq --compact

!DOCTYPE=html html{body{h1=Welcome!'Rest here weary traveller and s'a(href=https://a.b.c)=lee'p until morning.'&#10;'Say '&nabla;}}

Even multiline comments can be printed as compact XMQ since */* means a newline. This is not possible with XML since there is no standardized way to escape newlines inside html/xml comments. multi.xmq

type {
    name = number
    /* After the type we define all the
       necessary --regex-- patterns to
       detect numbers. */
    pattern = [0-9]+
    pattern = [0-9]+.[0-9]
}

xmq multi.xmq to-xmq --compact

type{name=number /*After the type we define all the*/*necessary --regex-- patterns to*/*detect numbers.*/pattern=[0-9]+ pattern=[0-9]+.[0-9]}

Note that, for historical reasons, xml/html does not permit two or more consecutive dashes inside a comment! This is quite a showstopper if you just want to comment out some large part of your document. As you can see two dashes are permitted in xmq-comments and the xmq tool works around this problem when converting to xml/html by adding a very specific char (␐) in such a way there are no two consecutive dashes in the xml. When loading from such xml, the char (␐) is instead removed to restore the two dashes.xmq multi.xmq to-xml

<?xml version="1.0" encoding="utf8"?>
<type><name>number</name><!--After the type we define all the
necessary -␐-regex-␐- patterns to
detect numbers.--><pattern>[0-9]+</pattern><pattern>[0-9]+.[0-9]</pattern></type>

Quoting and entities inside attributes

For elements, a key=value is syntactic sugar for: key{'value'} For attributes, you can only write: key=value In the example below all content is: 123 If the xmq tool detects that all children of an element are either text or entities, then it will present the element as a key value pair.

content {
    value = 123
    value = '123'
    value { '123' }

    element(value = 123
            value = '123')
}

XMQ is designed with the assumption that we rarely need significant leading/ending whitespace/quotes. However sometimes you have to have that. For an element value, you can express such whitespace in different ways:

// Leading and ending two spaces
spaces = '  alfa  '
spaces = ( &#32;&#32; 'alfa' &#32;&#32; )
spaces { &#32;&#32; 'alfa' &#32;&#32; }

// Leading and ending single quotes
apos = '''
       'alfa'
       '''
apos = ( &#39; 'alfa' &#39; )
apos { &#39; 'alfa' &#39; }

// Leading and ending newlines
newlines = ( &#10; 'alfa' &#10; )
newlines { &#10; 'alfa' &#10; }

You can see the magenta colored parentheses ( ) after the equal = sign. This is a compound value which can only consist of quotes and entities. Compound values are mandatory for the attribute values that need multiple quotes/entities since braces { } cannot be used inside an attribute value.

content(newlines = ( &#10; 'alfa' &#10; ))

Viewing large html pages

The xmq tool is useful to decode large html pages. Let us assume that you downloaded a large html page: index.html

xmq index.html delete /html/head delete //style delete //script pager

This command will delete the head node and all style and script nodes, before using the pager to show you the htmq. The argument to delete is an xpath expression.

There are other commands to modify the xmq. In particular you can see how this web page is constructed by replacing entities with text files or with xmq files. index.htmq

JSON

We can use the xmq tool to convert shiporder to json:xmq shiporder.xmq to-json | jq . You can see that the xml element name is folded as the key "_":"shiporder" and attributes are folded as children prefixed with underscores.

{
  "_": "shiporder",
  "id": 889923,
  "type": "container",
  "shipto": {
    "_sailing": "",
    "address": "The Vasa Museum\nGalärvarvsvägen 14\n115 21 Stockholm\nSweden",
    "//": "Remember to verify coord.Remember to verify coord.",
    "coord": "59°19'41.0\"N 18°05'29.0\"E"
  },
  "rules": {}
}

If an XMQ value is a valid JSON number, true, false or null, then it is converted to the proper JSON value (no quotes) see "id":889923 . If you want to force a number to a JSON string, then add the S attribute: speed(S)=123 will translate into "speed":"123" .

We can also use the xmq tool to parse json.
curl -s 'https://dummyjson.com/todos?skip=4&limit=2' | xmq or
xmq todos.json todos.xmq todos.json

A limitation of JSON is painfully visible as underline element names. The Javascript objects lack types which means that the JSON objects lacks type information and this results in the corresponding XMQ element get the anonymous name/type underscore _. If the JSON object contains a child _ with a name, then xmq will use this as the key instead, this enables proper back-forth conversion between XMQ and JSON.

_ {
    todos(A)
    {
        _ {
            id        = 5
            todo      = 'Solve a cube'
            completed = false
            userId    = 31
        }
        _ {
            id        = 6
            todo      = 'Bake pastries'
            completed = false
            userId    = 39
        }
    }
    total = 150
    skip  = 4
    limit = 2
}

{
  "todos": [
    {
      "id": 5,
      "todo": "Solve a cube",
      "completed": false,
      "userId": 31
    },
    {
      "id": 6,
      "todo": "Bake pastries",
      "completed": false,
      "userId": 39
    }
  ],
  "total": 150,
  "skip": 4,
  "limit": 2
}

XMQ can read and generate json. The workflow is intended to be: 1) convert json to xmq 2) work on the data as xmq 3) possibly write back to json. Some clean xml files (like pom.files) can be converted to clean json, worked on in json and then converted back to xmq. But if you convert generic xml/html with standalone text nodes and attributes to json, then the result is probably rather confusing since key values must be unique in json.

The xmq tool solves this by detecting non-unique element names and suffixing them with [0], [1], [2] etc.

html {
    body {
        h1 = Introduction
        h2 = Todo
        p  = 'All work and no play.'
        p  = 'All work and no play.'
        p  = 'All work and no play.'
        h2 = Done
        p  = 'Not much here.'
    }
}

{
  "_": "html",
  "body": {
    "h1": "Introduction",
    "h2[0]": "Todo",
    "p[0]": "All work and no play.",
    "p[1]": "All work and no play.",
    "p[2]": "All work and no play.",
    "h2[1]": "Done",
    "p[3]": "Not much here."
  }
}

XSLT transforms

XSLT transforms can be terribly(horribly) complicated when you have to deal with whitespace. Again the XMQ format solves this part of the XSLT complexity because the whitespace in XMQ is explicit. Let us convert the curled JSON directly into an HTML page using xmq and an xslq transform. todos.xslq todos.xslt
xmq todos.json transform todos.xslq to-html > todos.html

xsl:stylesheet(version   = 1.0
               xmlns:xsl = http://www.w3.org/1999/XSL/Transform)
{
    xsl:output(method         = html
               doctype-system = about:legacy-compat
               encoding       = utf-8
               indent         = yes)
    xsl:template(match = _/todos)
    {
        html {
            body {
                table(border = 1)
                {
                    xsl:for-each(select = _)
                    {
                        tr {
                            td {
                                xsl:value-of(select = todo)
                            }
                        }
                    }
                }
            }
        }
    }
    xsl:template(match = total)
    xsl:template(match = skip)
    xsl:template(match = limit)
}

And this is the generated HTMQ. todos.html

!DOCTYPE = html
html {
    body {
        table(border = 1)
        {
            tr {
                td = 'Solve a cube'
            }
            tr {
                td = 'Bake pastries'
            }
        }
    }
}

We can generate whitespace exact text output and still have a pretty printed and readable xslq-transform. The to-text commad will output only text nodes and substituted entities.
xmq todos.json transform todosframed.xslq to-text
todosframed.xslq

xsl:stylesheet(version   = 1.0
               xml:space = preserve
               xmlns:xsl = http://www.w3.org/1999/XSL/Transform
               xmlns:fo  = http://www.w3.org/1999/XSL/Format)
{
    xsl:output(method = text)
    xsl:template(match = _/todos)
    {
        '┌─────────────┐'
        &#10;
        xsl:for-each(select = _)
        {
            '│'
            xsl:value-of(select = '''substring(concat(todo,'             '), 1, 13)''')
            '│'
            &#10;
        }
        '└─────────────┘'
        &#10;
    }
    xsl:template(match = total)
    xsl:template(match = skip)
    xsl:template(match = limit)
}

And this is the generated text file. todos.text

┌─────────────┐
│Solve a cube │
│Bake pastries│
└─────────────┘

DTD:s and XSD:s

The inline DTD is the value assigned to the !DOCTYPE. There are no changes the the DTD language.

!DOCTYPE = 'goo [
            <!ENTITY copy "&#169;">
            <!ENTITY alfa "ALFA&copy;">
            <!ENTITY banana "<x>BaNaNa</x>">
            ]'
goo = ( 'XMQ<'
        &alfa;
        '>minions say:'
        &banana; )

<!DOCTYPE goo [<!ENTITY copy "&#169;">
               <!ENTITY alfa "ALFA&copy;">
               <!ENTITY banana "<x>BaNaNa</x>"> ]>
<goo>XMQ&lt;&alfa;&gt;minions say:&banana;</goo>

Since XSD:s are normal xml they are rendered as XMQ in the same way as other XML.

schema(xmlns = http://www.w3.org/2001/XMLSchema)
{
    element(name = shiporder)
    {
        complexType {
            sequence {
                element(name = id
                        type = string)
                element(name = type
                        type = string)
                element(name = shipto)
                {
                    complexType {
                        sequence {
                            element(name = address
                                    type = string)
                            element(name = coord
                                    type = string)
                        }
                    }
                }
                attribute(name = ssailing
                          type = string
                          use  = required)
            }
        }
    }
}

Conclusions

With XMQ the assumed dichotomy between mark-up languages (like XML) and key-value store languages (like JSON) has been (perhaps surprisingly) removed. We can now use XML for mark-up situations and XMQ for key-value store situations. They are interchangeable and all the years of effort going into XSLT/XSD and other tools can still be used with XMQ.

There are of course still bugs to fix in xmq and improvements to how the specification works. Please let me know if you find bugs or other improvements.