Bigger, faster, smarter - the monster returns.
Objectives
Create a javascript program to convert LaTeX math formula's into responsive MathML. In order of importance
- User friendly LaTeX-like math input.
- Mobile phone friendly « responsive » math web pages output.
- Clean, compliant, browser-supported MathML generated.
- Fast execution, small footprint.
- OpenType Math fonts and Unicode mathematical symbols.
- Compatibility with MathJax and KaTeX.
Reference Material
- TeXZilla 1 - FontDrop! - Wakamai Fondue
- MathML Core - Mathematical Alphanumeric Symbols: Unicode, MathML, Wikipedia
- Unicode Planes - Math Characters - Standardized Variants - Named Character References - Stix
- Latex / Plain TeX - Mathematics - Advanced Mathematics - AMS Package - CTAN Packages - Maths Symbols - Comprehensive LaTeX Symbol List
- Mathematics With TeX - Tex Tutorial - TeXeR online LaTeX renderer
- TeX Commands Available in MathJax - MathJax Supported TeX/LaTeX commands - KaTeX Support Table
Issues
In hindsight the main issues to implementation have not been the TeX syntax itself rather they have been
-
The lack of a definitive set of Latex commands to implement.
Because LaTeX is an amalgam of hundreds if not thousands of packages each built from low-level TeX
commands and somewhat adhoc in nature it has been hard to determine what to implement.
Up to this point the TeXZilla command set is made up of (almost) all the math commands from the TeXBook,
the commands from Mathjax 2, the commands from the
baseandamspackages and other useful commands from theunicodeandmathtoolspackages. Also a few gaps such as missing Latin-like Capital Greek letters and various fences, brackets and a few other symbols have been added. - The need to completely avoid layout computation logic. Because the intention is to defer that to the Browser's MathML layout engine.
-
What was also unexpected was the release of the MathML 3 Core spec which deprecates
many MathML 4 features and instead asks that they be implemented using CSS.
MathML 3 Core is the spec implemented by Chrome and Edge and it is the spec targeted by TeXZilla.
Because of that the major blockers to widespread TeXZilla browser support are now Firefox and Safari. Which is rather ironic since those browsers have up until now been the leaders in providing MathML support. That is not to say that Chrome does a better job that Firefox - far from it. There are many minor inexplicable issues in Chrome which affect some Math fonts but not others, that simply do not occur in Firefox.
But the big issue in Firefox is the lack of CSS support for<mrow>elements. It is in clear contravention of the MathML 3 Core spec in this regard. -
What has not been an issue is the availability of several very nice open source OpenType Math fonts.
The main issues with the fonts (in Chrome) is that most of them fail to work properly with prime marks.
Also many have issues with horizontally stretchy glyphs.
And there is no font independent way to enable some rarely used special glyph variants.
And most lack full support for both the chancery and roundhand script alphabets.
Because of that TeXZilla always uses the STIX Two Math font for those particular features.
In addition it should be noted that TeXZilla has gone down the unicode math path where, as far as possible, all glyphs have their own unique unicodes and come from a single OpenType Math font. -
The following glyph mapping convention has been adopted because of this (in addition to the fact that
the current scheme(s) in LaTeX seem excessively complicated).
Each font name has the form
\math(bold)(family)(italic)-
bold=bforblankwhich means normal font weight -
family=sf,bb,cal,frak,scr,ttorblankwhich means serif -
italic=it,rm,uporblankwhich means automatic according to the glyph's alphabetic category
-
-
Only Roman letters, numerals and Greek letters which have been entered via the Greek commands
\alphaand\Alphaetc. have mapping rules applied to them. Other character codes such as those entered directly as unicodes or as commands like\mitRor\mbfAlphahave no glyph mapping applied.
Implementation
Input
All input is Unicode, it's not limited to Ascii. TeXZilla interprets it's input as a stream of tokens.
- Whitespace: tab, newline, carriage return and space characters. Whitespace is mostly ignored.
-
Command characters:
{ } _ ^ &represent begin group, end group, subscript, superscript and alignment respectively -
Hyphen:
'-'represents mathematical minus sign -
Colon:
':'represents mathematical logical colon or ratio -
Dashes:
','','''etc. represent single, double, triple primes etc. -
Commands:
\xxxwhere xxx are letters a-z or A-Z or a single special character. For example\frac -
Arguments:
#nwhere n is a single digit 1-9, these are the names of macro arguments and mark insertion points during macro expansion -
Comments: characters between a
%and the end of line, these characters are discarded - Other Characters: any single Unicode code point excluding those above represent a symbol or operator
Mathematical Symbols
Almost all mathematical symbols have been assigned unicode's in the extended plane
and are supported in the common MathML fonts.
So there is no need for any other special fonts at all.
Commands like \mathfrak and \mathcal are implemented simply by mapping A-Z etc. into this plane.
Macros
Are user-defined commands. Perform text substitution - using strings of tokens. Arguments for macro's are marshalled by looking ahead in the input stream for the next token, or string of tokens contained between balanced pairs of { }. As a result of this rule macro argument values always contain balanced sets of braces. The outer pair of braces are not considered to be part of the macro value. In TeX there is no implicit boxing of command arguments. However many of the AMS commands appear to get their arguments automatically boxed in braces. I'm guessing the reason for this discrepancy is that the AMS commands were once macro's which boxed their arguments internally.
Design (in a state of flux)
Tokeniser
Input text is converted to tokens and macro substitution is performed resulting in a stream of tokens with all macro commands and arguments resolved to other tokens.
-
Implement
\def
Interpreter
Tokeniser's output is interpreted and converted to a stream of MathML rendering instructions. It is here where context-specific parsing rules are implemented to ensure compatibility with LaTeX or MathJax etc. There are a number of differences in interpretation between the two.
- Add more boxing to match AMS LaTeX.
Renderer
The renderer processes the instructions from the interpreter and builds a MathML element tree. The tree structure is obvious and quite easy to achieve. The correct styling of the tree elements is not so easy, and still a work in progress.
- Improve the rendering of primes.
- Improve LaTeX / MathJax compatibility by addressing styling issues.
- Add support for many more commands.
Accents
Despite what AI says using combining diacritics for accents does not work well in MathML in Chrome. This is because it typesets the combining diacritic at a fixed position for all glyphs. The result is the accent often does not appear in the usual position expected.
For example \\not X is resolved by the interpreter as follows.
If X is a glyph which has a unicode not form then is converted to that form.
Otherwise it is rendered as sub-formula overlayed with a near vertical line through it.
Math Style
The following user selectable italicization styles have been implemented.
| Math Style | Roman | Greek | Numbers | ||
|---|---|---|---|---|---|
| Upper | Lower | Upper | Lower | ||
| ISO | italic | italic | italic | italic | upright |
| TeX | italic | italic | upright | italic | upright |
| French | upright | italic | upright | upright | upright |
| Upright | upright | upright | upright | upright | upright |
LaTeX Math Fonts
The math fonts are designed to conform to the Mathematical Alphanumeric Symbols table. They do this by translating the glyph characters into the symbol table. So each standard LaTeX character has it's own unique unicode. Because of this these alphabets differ slightly from their Latex versions. In particular they provide lowercase calligraphic characters, and consistent and universal control of italicization. In the table below Italic auto means the italicization is determined by the Math Style.
The standard math font can be left as the system default math font or defined in a CSS style sheet like this:
math {
font-family: STIX Two Math;
font-size: 120%;
}
For consistency across multiple platforms the font should be uploaded from your web site in the style sheet like this:
@font-face {
font-family: 'STIX Two Math';
src: url('myfonts/STIXTwoMath-Regular.woff2') format('woff2');
}
For better performance some fonts can be uploaded from Google by adding code like this to <head>.
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Caveat+Brush&display=swap" rel="stylesheet">
| Font | Bold | Family | Italic | Alphabets |
|---|---|---|---|---|
| \mathnormal | serif | mixed | roman, greek, numbers | |
| \mathrm | serif | mixed | roman, greek, numbers | |
| \mathup | serif | upright | roman, greek, numbers | |
| \mathit | serif | italic | roman, greek | |
| \mathbfit | bold | serif | italic | roman, greek |
| \mathbf | bold | serif | upright | roman, greek, numbers |
| \mathsf | sans-serif | upright | roman, greek, numbers | |
| \mathsfit | sans-serif | italic | roman, greek | |
| \mathbfsf | bold | sans-serif | upright | roman, greek, numbers |
| \mathbfsfit | bold | sans-serif | italic | roman, greek |
| \mathbb | double-struck (blackboard) | upright | roman, numbers | |
| \mathbbit | double-struck (blackboard) | italic | roman | |
| \mathcal | chancery script (calligraphic) | roman | ||
| \mathbfcal | bold | chancery script (calligraphic) | roman | |
| \mathfrak | fraktur | roman | ||
| \mathbffrak | bold | fraktur | roman | |
| \mathscr | roundhand script | roman | ||
| \mathbfscr | bold | roundhand script | roman | |
| \mathtt | monospace (teletype) | roman, numbers | ||
| \oldstyle | cursive (old style numbers) | numbers |
The current math font is used to provide almost all the glyphs for the above alphabets.
But because math fonts are inconsistent in providing the glyphs for \mathcal and \mathscr
those commands always use the STIX Two Math font.
In addition a system-selected font is used to provide \oldstyle numerals because they are not available in most math fonts.
User Defined Fonts
You can define your own font using this Latex code
$ \newfontfamily{\mathmkr}{mathmkr} $
at the start of the page. Then in a style sheet add a CSS font family and class, for example
@font-face {
font-family: 'Caveat Brush';
src: url('myfonts/CaveatBrush-Regular.ttf');
}
.mathmkr {
font-family: Caveat Brush;
font-size: 120%;
}
You can then use it like other Latex fonts in the TeX code for example \mathmkr{x} = ...
Version 2.1
LaTeX Packages
TeXZilla does not load and process LaTeX packages. Instead it has a library of about 1000 commands taken from widely used LaTeX packages. These commands are converted to directly to MathML by imitating the intended effect of the LaTeX command.
The advantage of this approach is that package and user macro conflicts are completely avoided. It also means some commands can be implemented with more natural syntax. And there is no user configuration required.
The disadvantage is that the TeXZilla code is not 100% compatible with LaTeX. But this is no big deal because there are already quite a few compatibility "issues" within LaTeX and also with MathJax and KaTeX. If you fully bracket command arguments and avoid "low-level" and "non-math" LaTeX commands you will get a high degree of compatibility.
Argument Marshalling
TeXZilla uses the TeX rules for marhalling macro and command arguments.
But command arguments which are expected to be "sub-formula" are marshalled slightly differently, more in line with MathJax. Each unbracketed argument extends until the end of the "sub-formula".
Fonts
TeXZilla uses a single type OpenType Math base font to provide almost all glyphs. The base font to be used is specified in the configuration. It uses STIX Two Math as a fallback to provide glyphs missing from the base font. If a glyph is missing from both the base font and STIX it will be sourced from a system font.
Most OpenType Math Fonts do not include complete sets of Chancery and Roundhand script characters. So some or all of these characters sets are sourced from STIX (which is why they often look the same when the base font is changed). Glyphs which do not have assigned unicodes, are sourced from the STIX private use area. These include "blackboard italic" latin glyphs, "sans serif" Greek letters and a small number of "negated" relations.
Oldstyle numbers which are not available in STIX Two Math. If the base font doesn't provide them they are sourced from the system cursive font.
Greek Letters
TeX does not include the Latin-like Greek capital letters, it simply uses the Latin equivalents.
To better integrate with unicode TeXZilla includes them as separate letters with the names \Alpha etc.
Also to align with the Mathematical Aphanumeric Symbols unicode block it includes two additional capital Greek letters \Digamma and \Thetasym.
Troubleshooting
Drawing overlines and underlines has many issues in Chrome: Firstly it's not clear in the MathML spec which unicode character should be used to render stretchy horizontal lines. Secondly almost all the possible candidates fail to stretch properly and the small number that do vary from font to font. There are three possible workarounds:
- Set font to STIX and use the low line and overline characters. Implemented and looks ok.
-
Draw the line using SVG element embedded inside an
mtextelement. Implemented and looks very good - but seems like overkill. - Set a CSS top / bottom border on the base element. Should work ok - but a bit hacky.
Some arrow glyphs in some fonts stretch properly in Firefox but not Chrome: What appears to be happening here is that the font glyphs do not have the correct OpenType tables to allow them to stretch. But Firefox detects that condition and substitutes in a glyph from a working fallback font, whereas Chrome does not. Normal CSS font fallback mechanisms do not help because they only fallback when the glyph does not exist.
Wide accents like tilde etc. stretch properly in Firefox but not Chrome: What appears to be happening is that these accents do not have the correct OpenType tables in any font to allow them to stretch. But Firefox detects this and applies a scaleX transform, whereas Chrome does not. Also Sarari detects this and generates substitutes SVG. Possible workarounds:
- Generate glyph using SVG. Implemented and works ok but graphic quality is mediocre.
- Stretch glyph using CSS transform. Is it possible to get width??
- Extend the syntax to specify an optional width. Render glyph 1 .. 7 from one of the fonts which have arrays of such glyphs eg. XCharter. This produces the best quality but is not user friendly and not compatible with other LaTeX programs.
OpenType Math Fonts
| STIX | ASAN | CONC | GARA | GFS | LATM | LETE | NCM | NOTO | XCH | XITS | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Calligraphic | def | no | def | ss03 | def | def | def | def | ss01* | no | ss01 |
| Script | ss01 | def | no | def | no | no | no | ss01 | def | def | def |
| Blackboard Italic | priv | no | no | no | no | no | no | no | no | no | ss06 |
| Blackboard Bold | no | no | no | no | no | no | priv | ss03 | no | priv | ss05 |
| Sans Serif Greek | priv | no | priv | no | no | no | priv | priv | no | priv | ss02 |
| Full size primes | ss04 | def | ssty | no | no | no | ssty | no | ssty | ssty | no |
| Wide hat etc. | no | 10ffa6 | e520 | no | no | no | e3e1 | no | no | e520 | no |
| Upright Integrals | priv | def | ss03 | ss07 | no | no | ss08 | ss02 | ss02 | ss03 | ss08 |
| Mathematical g | ss02 | no | no | no | no | no | cv11 | no | no | no | no |
| Oldstyle Numerals | no | yes | yes | no | priv | no | yes | no | no | yes | no |
Other Font Issues
- GFS Neohellenic has em square / bounding box 5 times higher than it should be.
- GFS Neohellenic has clockwise and anti-clockwise line integrals u2232 and u2233 switched.
- Neo Euler italic "h" is upright, but other letters are ok.
- TeX Gyre integrals signs do not stretch vertically as much as they should.
- Noto Sans and XITS: vertical lines and double vertical lines sometimes stretch only half what they should in Chrome (look at 4x4 determinants). Other brackets like (), [] and {} are ok. Other fonts and browsers are ok.
- Libertinus left and right angle brackets u27e8 don't stretch in Chrome and Safari.
-
STIX Two and Libertinus: Text above
\xrightarrowetc. is typeset far too high in Chrome and Safari. All other fonts are typeset correctly.