Background
Work in progress: definition of the Grammar Markup Language (GML). The current official version is GML 1.0 (see below).
General Syntax
To not steal ‘common’ characters, GML utilizes three graphic characters, the left and right floor symbols and the broken vertical bar. The left and right delimiters always need to be paired with a one-character ‘element’ name, e.g. ‘/’ for italics. To not only allow nesting but partially overlapping spans (as we expect might be seen in HTML or LaTeX sources, for example), there is both an opening and matching closing tag; both carry their name, e.g
⌊/text/⌋
These can be embedded in GML text, using the following escape conventions
⌊⌋⌋
⌊⌊⌋
⌊¦⌋
These conventions mean that there cannot be empty tags, nor can there be no content between the opening or closing bracket (⌊ and ⌋) and the separator (¦).
What about self closing tags (for instance images)? The following should work okay:
⌊✎⌋?
(Outdated) List of Element Types (in GML 0.9)
Name | GML markup | Comment | Mediawiki markup |
Document | ⌊document¦title¦document⌋ | root node | |
Heading | ⌊=¦text¦=⌋ | level attribute | = level1 =, == level2 ==, … or <h1>level1</h1>, <h2>level2</h2>, … |
Link | ⌊>¦text¦>⌋ | do we need different types? optional target attribute? | [[target|text]], <a>text</a>, or a URL |
Template | ⌊x¦text¦template-name¦par1¦par2¦x⌋ | {{name|arg1|arg2}} | |
Source code | ⌊ƒ¦text¦ƒ⌋ | <code>public static void main</code> | |
List | ⌊1¦⌊#¦item1¦#⌋⌊#¦item2¦#⌋¦1⌋ and ⌊•¦⌊#¦foo¦#⌋⌊#¦bar¦#⌋¦•⌋ ? | numbered and unnumbered; do we need parameters? | <ul>…</ul> or <ol>…</ol> |
List item | ⌊#¦item¦#⌋ | <li>item</li> or # item or * item | |
Bold | ⌊*¦text¦*⌋ | <b>text</b>, ’'’text’’’, <strong>text</strong> | |
Strike through | ⌊-¦text¦-⌋ | <del>text</del>, <strike>text</strike> | |
Tele-typed | ⌊t¦text¦t⌋ | <tt>text</tt> | |
Quote | ⌊”¦text¦”⌋ | <blockqute>text</blockquote> | |
Abbreviation | ⌊.¦text¦extended term¦.⌋ | <abbr title=”extended term”>text</abbr> | |
Italics | ⌊/¦text¦/⌋ | <i>text</i> ’‘text’’ | |
Underline | ⌊_¦text¦_⌋ | <u>text</u> or <ins>text</ins> | |
Superscript | ⌊^¦text¦^⌋ | <sup>text</sup> | |
Subscript | ⌊,¦text¦,⌋ | <sub>text</sub> | |
Small text | ⌊↓¦text¦↓⌋ ? | <small>text</small> | |
Big text | ⌊↑¦text¦↑⌋ ? | <big>text</big> | |
Paragraph | ⌊p¦text¦p⌋ | <p>text</p> or double newline | |
Definiton term | ⌊:¦term¦:⌋ | The term is not obligatory in mediawiki, the definition-description (:) is often used to indent text. | ;term or <dt>term</dt> |
Definition Description (indented text) | ⌊⇥¦description¦⇥⌋ | :description | |
Variable | ⌊ƒ¦text¦ƒ⌋ | Merge with source code ? | <var>text</var> |
Math | ⌊ƒ¦text¦ƒ⌋ | Merge with source code ? | <math>LaTeX</math> |
Citation | ⌊’¦text¦’⌋ | <cite>text</cite> | |
Image | ⌊i⌋ | what about captions? | [[File:image.jpg]] |
Preformatted text | ⌊pre¦text¦pre⌋ | <pre>text</pre>, <poem>text</poem> or a line starting with whitespace | |
Div | ⌊p¦text¦p⌋ | Merge with Paragraph? | <div>text</div> |
For tags observed in collected html, see in WeSearch/DataCollection (the only elements I can see are missing are those related to tables - which won’t be included as they’re outside of the scope of linguistic relevance? (XHTML spec.)
Towards GML 1.0
One-letter element name (preferably staying away form ‘regular’ ASCII characters, except where following established conventions, e.g. ⌊/ … /⌋); One committee member is concerned about scalability (seeing the limited range of available characters in Unicode). No need for middle delimiter following or preceding element names, except in conjunction with attributes (which stick to the closing tag).
paragraph and sentence boundaries are not marked up using GML elements, but rather through single (sentence boundary) and double newlines (pargraph), somewhat like in LaTeX (if you will).
Redefined elements:
- bold ∗
- document δ
- paragraph ¶ (not used)
- math ×
- code ◊ (including var)
- teletype τ
- abbreviation µ
- cite ⌊<Solberg, 2012<⌋
- pre π
- img ✎
- template λ
- linebreak ⌊↵⌋ (not used)
Name | GML markup | Comment | Mediawiki markup |
Document | ⌊δtitleδ⌋ | root node | |
Heading | ⌊=text¦2=⌋ | level attribute | = level1 =, == level2 ==, … or <h1>level1</h1>, <h2>level2</h2>, … |
Link | ⌊>text>⌋ | [[target|text]], <a>text</a>, or a URL | |
Template | ⌊λexpanded-text¦par1¦par2¦template-nameλ⌋ | {{template-name|par1|par2}} | |
Source code | ⌊◊text◊⌋ | <code>public static void main</code> or <var>n</var> <kbd>ls -l</kbd> | |
Numbered List | ⌊➊⌊#item1#⌋⌊#item2#⌋➊⌋ | <ol>…</ol> | |
Unnumbered list | ⌊•⌊#foo#⌋⌊#bar#⌋•⌋ | <ul>…</ul> | |
List item | ⌊#item#⌋ | <li>item</li> or # item (numbered) or * item (unnumbered) | |
Bold | ⌊∗text∗⌋ | <b>text</b>, ’'’text’’’, <strong>text</strong> | |
Strike through | ⌊-text-⌋ | <del>text</del>, <strike>text</strike> | |
Tele-typed | ⌊τtextτ⌋ | <tt>text</tt> | |
Quote | ⌊”text”⌋ | <blockqute>text</blockquote> | |
Abbreviation | ⌊µtext¦extended term (optional)µ⌋ | <abbr title=”extended term”>text</abbr> or <acronym>WDC</acronym> | |
Italics | ⌊/text/⌋ | <i>text</i> ’‘text’’ | |
Underline | ⌊_text_⌋ | <u>text</u> or <ins>text</ins> | |
Superscript | ⌊^text^⌋ | <sup>text</sup> | |
Subscript | ⌊,text,⌋ | <sub>text</sub> | |
Small text | ⌊↓text↓⌋ | <small>text</small> | |
Big text | ⌊↑text↑⌋ | <big>text</big> | |
Definiton term | ⌊:term:⌋ | The term is not obligatory in mediawiki, the definition-description (:) is often used to indent text. | ;term or <dt>term</dt> |
Definition Description (indented text) | ⌊⇥description⇥⌋ | :description | |
Math | ⌊×text×⌋ | <math>LaTeX</math> | |
Citation | ⌊<text<⌋ | <cite>text</cite> | |
Image | ⌊✎⌋ | [[File:image.jpg]] | |
Preformatted text | ⌊πtextπ⌋ | <pre>text</pre>, <poem>text</poem> or a line starting with whitespace |
Revision History
GML has evolved through three revisions so far (as of February 2013): A first and incomplete version 0.1 was presented in Ytrestøl et al. (2010); a greatly extended version was developed in Solberg (2012), and shortly after moderately refined (for increased readability) as the current version 1.0 (documented above).
Last update: 2020-07-16 by StephanOepen [edit]