#header,next=TEST_CASES.html,prev=./FXY2.html

During COVID I wrote up some math notes in HTML taking advantage of the wide screen format of desktop systems to display large formulae. However now that I am back to commuting by train, it would be nice to display those notes on my mobile phone. The desktop mode of the original HTML does a poor job at displaying on such a device. Therefore I have designed and implemented a script preprocessor which converts TeX-like math code into responsive HTML code.

There are some obvious things that can be done to improve the display of math on small touch screens. From most desirable to least

While those things are fine for inline math they are not good enough for display equations. It would be nice to have something similar to text-flow but more sophisticated, where it rearranges the equation to fit into the available space something like a human would do on a blackboard. This is where the preprocessor comes in.

Preprocessor

The preprocessor converts TeX code to responsive HTML code. These are the steps in the process:

Algorithms

flow

The flow algorithm is very similar to normal text flow and is most suitable for homogenous text like polynomials. The left parameter tells the algorithm to break just before plus and minus signs (normally it breaks just after). The indent parameter tells the algorithm to indent lines after the first to the level of the first equals sign.

#flow,left,indent
\Delta = 256a^3e^3 - 192a^2bde^2 - 128a^2c^2e^2 + 144a^2cd^2e - 27a^2d^4 + 144ab^2ce^2 - 6ab^2d^2e - 80abc^2de + 18abcd^3 + 16ac^4e - 4ac^3d^2 - 27b^4e^2 + 18b^3cde - 4b^3d^3 - 4b^2c^3e + b^2c^2d^2

This is the above code rendered with an equation label.

#flow,left,indent \Delta = 256a^3e^3 - 192a^2bde^2 - 128a^2c^2e^2 + 144a^2cd^2e - 27a^2d^4 + 144ab^2ce^2 - 6ab^2d^2e - 80abc^2de + 18abcd^3 + 16ac^4e - 4ac^3d^2 - 27b^4e^2 + 18b^3cde - 4b^3d^3 - 4b^2c^3e + b^2c^2d^2

To see the math respond to screen size change on a mobile device, rotate the device to switch between portrait and landscape mode.

fold

The fold algorithm is a text flow algorithm in which the line breaks occur at predetermined positions and in a predetermined order. It is better for inhomogenous text like matrices and integrals. For a large equation which won't fit on one line it usually looks best to do the first break after the top level equals sign. Then if needed, another break around the middle of what's remaining.

#fold
\begin{vmatrix}
1 & x_1 & x_1^3 & x_1^4 \\
1 & x_2 & x_2^3 & x_2^4 \\
1 & x_3 & x_3^3 & x_3^4 \\
1 & x_4 & x_4^3 & x_4^4 \\
\end{vmatrix} = $
\big(x_1x_2 + x_1x_3 + x_1x_4 + x_2x_3 + $
x_2x_4 + x_3x_4\big) \cdot \prod_{i\lt j} (x_j-x_i)

This is the above code rendered with an equation label.

#fold2 \begin{vmatrix} 1 & x_1 & x_1^3 & x_1^4 \\ 1 & x_2 & x_2^3 & x_2^4 \\ 1 & x_3 & x_3^3 & x_3^4 \\ 1 & x_4 & x_4^3 & x_4^4 \\ \end{vmatrix} = $$ \big(x_1x_2 + x_1x_3 + $ x_1x_4 + x_2x_3 + $ x_2x_4 + x_3x_4\big) \cdot $ \prod_{i\lt j} (x_j-x_i)

stack

A stack is a set of equations which can be written on one line separated by comma's if there is room. Otherwise they can be stacked and aligned vertically at the equals signs. The width parameter is the container width which trigger's the transition from horizontal to vertical format.

#stack,width=500
x_3 = a \cdot \frac {x_1y_1 + x_2y_2} {x_1x_2 + y_1y_2}
y_3 = a \cdot \frac {x_1y_1 - x_2y_2} {x_1y_2 - x_2y_1}

This is the above code rendered with an equation label.

#stack,width=500 x_3 = a \cdot \frac {x_1y_1 + x_2y_2} {x_1x_2 + y_1y_2} y_3 = a \cdot \frac {x_1y_1 - x_2y_2} {x_1y_2 - x_2y_1}

train

A train is a list of expressions coupled together with equals signs. On a wide screen they look good written on one line, but on a narrow screen they look better written as a vertical stack with the equals signs vertically aligned. The width parameter is the container width which trigger's the transition from horizontal to vertical format.

#train,width=700
g_2
= -4(\epsilon_1\epsilon_2 + \epsilon_2\epsilon_3 + \epsilon_3\epsilon_1)
= \tfrac {1} {24} (A^2 + B^2 + C^2)
= \tfrac {1} {12} (12ae - 3bd + c^2)

This is the above code rendered with an equation label.

#train,width=700 g_2 = -4 (\epsilon_1\epsilon_2 + \epsilon_2\epsilon_3 + \epsilon_3\epsilon_1) = \tfrac {1} {24} (A^2 + B^2 + C^2) = \tfrac {1} {12} (12ae - 3bd + c^2)

Where To Next?

The above set of algorithms are a good starting point for implementing responsive layout features in a TeX to HTML javascript library. Implementing natively would have the following advantages

I think it unlikely the developers of MathJax and KaTeX would want to implement these more complex responsive layout features. This is because those packages are primarily focused on reproducing the exact rendering of LaTeX and it does not include such features.

Given the recent 2023 adoption of MathML Core support across all major browsers, I think the best way forward is to create a new software package which does TeX to MathML conversion directly, incorporating responsive layout features. It should be possible to mitigate all the undesirable effects of chopping math formula into small pieces, as is required for responsive layout. While the rendering quality of MathML is not as good as those two libraries, it is strategically superior and, given enough time, moderately simple to implement the conversion process.

Prototype MathML

The following equations are manually crafted MathML examples. They represent the ideal output of a TeX math to MathML converter which generates "responsive" HTML.

#flow

Δ = 256 a 3 e 3 192 a 2 b d e 2 128 a 2 c 2 e 2 + 144 a 2 c d 2 e 27 a 2 d 4 + 144 a b 2 c e 2 6 a b 2 d 2 e 80 a b c 2 d e + 18 a b c d 3 + 16 a c 4 e 4 a c 3 d 2 27 b 4 e 2 + 18 b 3 c d e 4 b 3 d 3 4 b 2 c 3 e + b 2 c 2 d 2
(5)

#fold

| 1 x 1 x 1 3 x 1 4 1 x 2 x 2 3 x 2 4 1 x 3 x 3 3 x 3 4 1 x 4 x 4 3 x 4 4 | = ( x 1 x 2 + x 1 x 3 + x 1 x 4 + x 2 x 3 + x 2 x 4 + x 3 x 4 ) i < j ( x j x i )
(6)

#stack

x 3 = a x 1 y 1 + x 2 y 2 x 1 x 2 + y 1 y 2 , y 3 = a x 1 y 1 x 2 y 2 x 1 y 2 x 2 y 1 ,
(7)

#train

g 2 = 4 ( ϵ 1 ϵ 2 + ϵ 2 ϵ 3 + ϵ 3 ϵ 1 ) = 1 24 ( A 2 + B 2 + C 2 ) = 1 12 ( 12 a e 3 b d + c 2 )
(8)

These examples demonstrate how to apply responsive styling directly on the MathML <mrow> elements thereby maintaining the semantic integrity of the <math> object tree.

MathML Issues

To evaluate the current state of MathML I have added KaTeX MathML and TeXZilla to the list of supported converters. They can be selected from the control panel which is activated by clicking on the web page title. Although there are a few issues, they provide a good yardstick to gauge the current state of MathML support in the mainstream browsers. I have also added Lexer a gamma version of the new converter. Current known issues are:


Prototype TeX-Like Math To MathML Converter

Objectives

Create a javascript program to convert TeX-like math formula's into responsive MathML. The objectives in order of importance

*Excluding bugs, for a good online LaTeX renderer see TeXeR.

Tokens

All input is Unicode, it's not limited to Ascii. Lexer interprets it's as a stream of tokens.

Mathematical Symbols

Almost all mathematical symbols have been assigned unicode's in the extended plane and are supported in the common MathML fonts. So there is no need for special fonts at all. Commands like \mathfrak and \mathcal are implemented simply by mapping A-Z etc. into this plane. They can also be directly embedded like this .

Macros

Are user-defined commands. Perform text substitution - using strings of tokens. Arguments for macro's are marshalled by looking ahead in the input stream for the next token, or string of tokens contained between balanced pairs of { }. As a result of this rule macro argument values always contain balanced sets of braces. The outer pair of braces are not considered to be part of the argument value.

Implementation (Javascript)

Tokeniser

During input processing input text is converted to tokens and macro substitution is performed resulting in a stream of tokens with all whitespace removed, and macro commands and arguments resolved to other tokens. The input processor is conceptually a generator function which yields tokens.

Each token is an object of the form { type, value, r } where type is either CMD, CHR, BEG, END, MAC or ARG. Respectively commands, characters, begin and end delimiter, macro, macro argument and end of input. The value is the token in textual form. The r is a resolver function which is dispatched to process the token.

Commands are looked up in a table and extra fields are added to the token or they may be converted into a character tokens. These extra fields include resolver functions. So \Delta becomes {CHR, Δ, rChar} and ^ becomes {CMD, ^, rSup} and \Large becomes {CMD, \Large, rSize, 2}. When resolver functions are dispatched their this pointer is bound to the Lexer and the token is passed as a parameter. This gives them direct access to the state of the Lexer and to any additional fields that have been attached to the token.

The EOF tokens are generated at the end of input text and end of macro's to simplify processing logic. Their resolver function will never be called unless there is an unexpected end of input condition, in which case it throws an error. Similarly the resolver function of an END token is never called unless there are unbalanced delimiters in which case it also throws an error.

Tokeniser's output the stream of tokens to a token "printer" for debugging.

MathML Renderer

The current state is held in the member variables of the Lexer. The resolver functions are effectively member's of the Lexer and when they execute they utilise the current state. When state change happen the current state is saved in local variables on the javascript call stack and then the state member variable(s) are updated.

The state tracks font size and style. It is used to implement commands like \large and \mathfrak. The graphics state is saved and restored by the associated token's resolver function. For example a {CHR, A, rLetter} token's resolver will look at the state and transform from a unicode A to a unicode fraktur A when the fraktur font style is in effect.

The resolver functions also build the MathML output tree. For example rsub will pull the next operand from the input token stream. Then it will replace the previous node in the MathML tree with a new <msub> node representing the previous node with a subscript generated from the next operand.

Example : \mathfrak x^{10}

The TeX string is converted to the tokens {CMD, \mathfrac, rMathfrac} {CHR, x, rLetter} {CMD, ^, rSup} {BEG, {, rBegin}, {CHR, 1, rNumber}, {CHR, 0, rNumber}, {END, }, rEnd} which are converted to MathML. The current tokeniser is called in a loop this.tok.next() until it receives an EOF. The resolver for each token is called and it may itself call the current tokeniser.

Then the main loop resumes.

Example: \sqrt[3] 2

This string converts to {CMD, \sqrt, rSqrt} {BEG, [, rBegin} {CHR, 3, rNumber} {END, ], rEnd} {CHR, 2, rNumber} the square brackets being marked as delimiters by the tokeniser because it knows to expect an optional argument after a \sqrt command (because when it looked up the command in the command table it said that). It is resolved as follows: the main loop calls this.tok.next() which retrieves a token and calls its resolver

Example: {x \over 2}

This string converts to {BEG, {, rBegin} {CHR, x, rLetter} {CMD, \over, rOver} {CHR, 2, rNumber} {END, }, rEnd} note although \over is a command it is interpreted as fence and resolved as follows:

Example: \begin{pmatrix} x ... \end{pmatrix}

The tokeniser invokes special processing rules when it sees a \beg or \end command and produces the following token string {BEG, \begin{pmatrix}, rMatrix} {CHR, x, rLetter} ... {END, \end{pmatrix}, rEndEnv} which is resolved as follows: