Bigger, faster, smarter - the monster returns.
Objectives
Create a javascript program to convert LaTeX math formula's into responsive MathML. In order of importance
- User friendly LaTeX-like math input.
- Mobile phone friendly « responsive » math web pages output.
- Clean, compliant, browser-supported MathML generated.
- Fast execution, small footprint.
- Support for all OpenType math fonts and Unicode mathematical symbols.
- Compatibility with MathJax and KaTeX.
Reference Material
- TeXZilla 1 - FontDrop! - Wakamai Fondue
- MathML Core - Mathematical Alphanumeric Symbols: Unicode, MathML, Wikipedia
- Unicode Planes - Math Characters - Standardized Variants - Named Character References - Stix
- Latex / Plain TeX - Mathematics - Advanced Mathematics - AMS Package - CTAN Packages - Maths Symbols - Comprehensive LaTeX Symbol List
- Mathematics With TeX - Tex Tutorial - TeXeR online LaTeX renderer
- TeX Commands Available in MathJax - MathJax Supported TeX/LaTeX commands - KaTeX Support Table
Design
LaTeX Packages
Like the original, TeXZilla 2 does not load and process LaTeX packages. Instead it has a library of about 900 commands taken from widely used LaTeX packages. These commands are directly converted to MathML by imitating the intended effect of the LaTeX command.
The advantage of this approach is that dependencies on LaTeX package's are completely avoided, as are conflicts with user macro's. It also means some commands can be implemented with more natural syntax and there is very little user configuration required.
The disadvantage is that the TeXZilla 2 code is not 100% compatible with LaTeX. This is no big deal because there are already quite a few compatibility "issues" within LaTeX and also with MathJax and KaTeX. If a high degree of compatibility is required, fully bracket command arguments and avoid "low-level" and "non-math" LaTeX commands.
Argument Marshalling
TeXZilla uses rules similar to TeX for marshalling macro arguments. An argument is either a single token or a group of tokens between curly brackets.
However rules for marshalling command arguments sometimes differ.
Each unbracketed argument extends until it's "natural" end token.
For example like TeX a length dimension extends until it's unit field.
But unlike TeX a subformula starting with a \left command extends until the corresponding \right command.
And a number extends until whitespace or a non-numeric token is encountered.
So for example TeXZilla interprets 12^34 as 12 to the power 34, not, 1 followed by 2 to the power 3, followed by 4.
To get full LaTeX compatibility for the two different interpretations write {12}^{34} or 1 2^3 4.
Layers
TeXZilla 2 has four layers of processing. The first layer walks the DOM tree finding strings of TeX code that need converting to MathML. It calls the other layers to convert them into strings of MathML code which it then injects back into the DOM tree.
The second layer takes a TeX string and converts it into a stream of tokens according to the venerable TeX syntax rules, expanding macro's along the way.
The stream of tokens is passed to the third layer, the interpreter which carries out any special syntax parsing required. It reduces the stream of about 900 different TeX tokens down to a more manageable 30 odd MathML tree building commands.
The fourth layer takes the stream of commands and builds a MathML-like tree. It then serializes the tree to a string of MathML code and passes the results back to the first layer.
Input
All input is Unicode, it's not limited to Ascii. TeXZilla interprets it's input as a stream of tokens.
- Whitespace: tab, newline, carriage return and space characters. Whitespace is mostly ignored.
-
Command characters:
{ } _ ^ &represent begin group, end group, subscript, superscript and alignment respectively -
Hyphen:
'-'represents mathematical minus sign -
Colon:
':'represents mathematical logical colon or ratio -
Dashes:
','','''etc. represent single, double, triple primes etc. -
Commands:
\xxxwhere xxx are letters a-z or A-Z or a single special character. For example\frac. All trailing space is ignored. -
Arguments:
#nwhere n is a single digit 1-9, these are the names of macro arguments and mark insertion points during macro expansion -
Comments: characters between a
%and the end of line, these characters are discarded - Other Characters: any single Unicode code point excluding those above represent a symbol or operator
Greek Letters
TeX does not include the Latin-like Greek capital letters, it simply uses the Latin equivalents.
To better integrate with unicode TeXZilla includes them as separate letters with the names \Alpha etc.
Also to align with the Mathematical Alphanumeric Symbols unicode block it includes two additional Greek capital letters \Digamma and \Thetasym.
Mathematical Symbols
Almost all mathematical symbols have been assigned unicode's in the extended plane
and are supported in the common MathML fonts.
So there is no need for any other special fonts at all.
Commands like \mathfrak and \mathcal are implemented simply by mapping A-Z etc. into this plane.
Fonts
TeXZilla uses a single type OpenType Math base font to provide almost all glyphs. The base font to be used is specified in the configuration. It uses STIX Two Math as a fallback to provide glyphs missing from the base font. If a glyph is missing from both the base font and STIX it will be sourced from a system font.
Most OpenType Math Fonts do not include complete sets of Chancery and Roundhand script characters. So some or all of these characters sets are sourced from STIX (which is why they often look the same when the base font is changed). Glyphs which do not have assigned unicodes, are sourced from the STIX private use area. These include "blackboard italic" latin glyphs, "sans serif" Greek letters and a small number of "negated" relations.
Oldstyle numbers which are not available in STIX Two Math. If the base font doesn't provide them they are sourced from the system cursive font.
Handwriting Fonts
There are three handwriting fonts available - neat, untidy and marker pen. These fonts are synthesized by combining glyphs from an Opentype math base font and a Truetype handwriting font.
Macros
Are user-defined commands. Perform text substitution - using strings of tokens. Arguments for macro's are marshalled by looking ahead in the input stream for the next token, or string of tokens contained between balanced pairs of { }. As a result of this rule macro argument values always contain balanced sets of braces. The outer pair of braces are not considered to be part of the macro value. In TeX there is no implicit boxing of command arguments. However many of the AMS commands appear to get their arguments automatically boxed in braces. I'm guessing the reason for this discrepancy is that the AMS commands were once macro's which boxed their arguments internally.
Math Style
The following user selectable italicization styles have been implemented.
| Math Style | Roman | Greek | Numbers | ||
|---|---|---|---|---|---|
| Upper | Lower | Upper | Lower | ||
| ISO | italic | italic | italic | italic | upright |
| TeX | italic | italic | upright | italic | upright |
| French | upright | italic | upright | upright | upright |
| Upright | upright | upright | upright | upright | upright |
When a handwriting font is selected Roman letters and numbers are neither italic nor upright, they have a "hand written" style.
When the Euler Math font is selected all characters are upright (it is a font feature of the font).
Unsupported LaTeX Commands
The following low-level LaTeX class commands are not supported because there is no reliable way to translate them to MathML:
\mathopen \mathclose \mathop \mathbin \mathord \mathpunct \mathrel
It's usually possible to use some high-level command instead. If there is no alternative the following MathML-like commands taken from TeXZilla 1 can be used.
\mi \mn \mo \mtext
Each takes one text argument which is the MathML tag content. For example to create a double increment operator or a number in scientific format
\mo{++} \mn{−1.234E18}
To Do
- Determine best way to handle non-math text.
- Implement responsive layout features.
Other possibilities:
- Allow different math styles in same web page.
- Comprehensive bold feature using multiple fonts.
OpenType Math Fonts
List of features supported by OpenType Math fonts.
| STIX | ASAN | CAMB | CONC | EREW | EULR* | FIRA | GARA | GFSN | LATM | LETE | LIBR | NCMM | NOTO | XCHR | XITS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Calligraphic | def | - | def | def | priv | def | - | ss03 | def | def | def | - | def | ss01 | - | ss01 |
| Bold Calligraphic | def | - | def | def | priv | def | - | ss03 | - | def | def | - | def | - | - | ss01 |
| Script | ss01 | def | - | - | def | - | - | def | - | - | - | def | ss01 | def | def | def |
| Blackboard Italic | priv | - | - | - | - | - | - | - | - | - | - | - | - | - | - | ss06 |
| Blackboard Bold | - | - | - | - | priv | priv | - | - | - | - | priv | - | ss03 | - | priv | ss05 |
| Sans Serif Greek | priv | - | - | priv | priv | priv | - | - | - | - | priv | - | priv | - | priv | ss02 |
| Full size primes | ss04 | def | - | ssty | ssty | ssty | - | - | - | - | ssty | ssty† | - | ssty | ssty | - |
| Wide hat etc. | - | 10ffa6 | - | e520 | e520 | e520 | - | - | - | - | e3e1 | - | - | - | e520 | - |
| Upright Integrals | priv | def | def | ss03 | ss03 | def | ss01 | ss07 | - | - | ss08 | def | ss02 | ss02 | ss03 | ss08 |
| Mathematical g | ss02 | - | - | - | - | def | - | - | - | - | cv11 | - | - | - | - | - |
| Oldstyle Numerals | - | yes | yes | yes | yes | yes | - | - | - | - | yes | - | - | - | yes | - |
* all glyphs upright † not quadruple prime
Common Font Issues
- Overlines and underlines don't stretch properly in most fonts in Chrome and also a small number in Safari. It's disappointing how such a basic feature is not working properly. A generic workaround has been implemented.
-
The following issue affects all fonts on Chrome but none on Firefox and Safari.
When brackets stretch vertically around 3 times or more, they suddenly move to the far right side of their bounding box.
This causes bracket spacing on large matrices to be uneven and is especially noticeable for the
\vertbracket. A workaround for\verthas been implemented. - Primes in most fonts render in the wrong position and/or wrong size in Chrome. To preserve the look and feel font-specific workarounds are implemented.
- Square roots have unwanted trailing space which varies from font to font in Chrome. This issue does not occur in Firefox and Safari. Font-specific workarounds are implemented.
- Almost all fonts are missing either Chancery or Roundhand style script characters. And those that do implement them, do so by differing methods. To preserve the look and feel font-specific workarounds are implemented.
-
Wide accents like
\widehatand\widetildestretch badly in Firefox and not at all in Chrome. A generic workaround has been implemented for all browsers. Several fonts appear to support stretchy tilde's but they do not work in any browser - must be a LaTeX specific font feature. - Some arrow glyphs in some fonts stretch properly in Firefox but not Chrome. Not a serious issue. Someone should work out whether this is a font issue or browser issue and fix accordingly.
- Exotic math glyphs are missing from most fonts. Workaround is to substitute them from STIX Two Math.
- About half the fonts are missing old style numerals. Workaround is to substitute them from the system cursive font because STIX Two Math doesn't have them either.
- Many fonts appear to support upright integrals but turning on that feature does not work in any browser. Must be another LaTeX-specific font feature.
- The MathML specification suggests using glyph selector codes for distinguishing between Chancery and Roundhand glyphs. However most fonts don't support this, and glyph selectors seem to be out of favour. The fix is to use font feature settings instead.
- In general combining diacritics don't work well with MathML. The fix is to avoid using them.
Other Font Issues
- Asana bold sans-serif italic capital I u1d644 is too slanted. Could substitute Asana bold sans-serif italic capital Iota which does not have the issue.
- Asana brackets fail to stretch more than about 3 times, on Chrome. Could substitute brackets fro STIX Two Math instead.
- Cambria has ugly layout for square roots especially nested ones. Someone should fix it.
- Erewhon, integral signs do not stretch in Chrome and Safari. But they do stretch properly in Firefox. Is this a font bug or a browser bug?
- Erewhon and GFS Neohellenic have bad line height in some circumstances, about 4 times higher than it should be. Fix is uncertain.
- GFS Neohellenic has clockwise and anti-clockwise line integrals u2232 and u2233 switched. Could swap them - but no-one has noticed this issue in 15 years!
- GFS Neohellenic is missing bold calligraphic, lowercase fraktur and bold fraktur. Workaround is to substitute from STIX Two Math.
- Lete Sans has a very extravagant text style lowercase italic g which doesn't look good in formula's. Quick fix is to enable math style g when font is loaded. Should implement a better fix where both styles are enabled as appropriate.
- Libertinus left and right angle brackets u27e8 and u27e9 don't stretch in Chrome and Safari. Fixed in latest version of font.
- Neo Euler is missing all the Roman and Greek italic letters in the Mathematical Alphanumeric Symbols unicode block. Workaround is to use Euler font instead.
- Noto Sans and XITS: vertical lines and double vertical lines sometimes stretch only half what they should in Chrome (look at 4x4 determinants). Other brackets like (), [] and {} are ok. Other fonts and browsers are ok. Could maybe implement an SVG workaround or substitute these glyphs from STIX Two Math. Possibly its a Chrome issue which will eventually be fixed.
-
STIX Two and Libertinus: Text above
\xrightarrowetc. is typeset far too high in Chrome and Safari. All other fonts are typeset correctly. Someone should fix it. - TeX Gyre xxx fonts have integral signs which do not stretch vertically as much as they should in both Firefox and Chrome. Only workaround would be to substitute integral glyphs from STIX Two Math.
Testing
Browsers: Chrome, Firefox, Safari, Edge. Also Samsung Internet and Opera.
Platforms: MacOS, Windows, Android, iOS.
Hardware: MacBook, Windows Desktop, Samsung Phone, iPad.