From LaTeX to MathML and Back with TeX4ht and PassiveTeX

Eitan M. Gurari
Ohio State University
gurari@cis.ohio-state.edu
    and    Sebastian Rahtz
Oxford University
sebastian.rahtz@oucs.ox.ac.uk

MathML International Conference
October 20-21, 2000
Urbana-Champaign, IL

http://www.cis.ohio-state.edu/~gurari/
docs/mml-00/mml-00.html

Contents

Contents
Bird View
LaTeX Foundations for TeX4ht
User Interfaces
Setting User Interfaces
Seeding Hooks
Configuring Hooks
Configuring Beyond Hooks
Reversibility
Tasks for Postprocessors
Limitations
Historic Perspective
PassiveTeX
References

Bird View

                 -----------------------------|
                 |                            |
            -----|                            |
LaTeX            |        TeX4ht              |
                 ---------------|--------------

                     XML       +| MathML
                 -----------------------------|
                 |                            |
                 |     PassiveTeX             |------
                 |                            |      PDF
                 ------------------------------

XML: XHTML, TEI, DocBOOK, ...

A LaTeX source:

 Gaussian of the form :
 \begin{eqnarray*}
 f( \epsilon , \delta s)
 \approx
 \frac
   {1}
   {\xi
    \sqrt{
      \frac{2 \pi}{\kappa}
      \left( 1 - \beta^2/2 \right)
   }}
 \exp \left [
   \frac
     {( \epsilon - \bar{\epsilon} )^2}
     {2}
   \frac
     {\kappa}
     {\xi^2 (1- \beta^2/2)}\right ]
 \end{eqnarray*}
 thus implying

The TEI/MathML output of TeX4ht:

 <!--l. 5--><div type="p"><p>Gaussian of the form :
 
 <!--l. 6--><formula notation="mathml"
 rend="block"><math xmlns="http://www.w3.org/1998/Math/MathML"
 display="block"> <mtable class="eqnarray-star"><mtr><mtd
 class="eqnarray-1"> <mi>f</mi><mrow
 ><mo>(</mo><mi>&#x03B5;</mi><mo>,</mo>
 <mi>&#x03B4;</mi><mi>s</mi><mo>)</mo></mrow> <mo>&#x2248;</mo>
 <mfrac><mrow ><mn>1</mn></mrow> <mrow
 ><mi>&#x03BE;</mi><msqrt><!--<mi>&#x2218;</mi>
    --><mfrac><mrow ><mn>2</mn><mi>&#x03C0;</mi></mrow>
  <mrow ><mi>&#x03BA;</mi></mrow></mfrac>   <mfenced
 open='(' close=')' ><mn>1</mn> <mo>&#x2212;</mo> <msup
 ><mi>&#x03B2;</mi><mrow ><mn>2</mn></mrow></msup
 ><mo>/</mo><mn>2</mn></mfenced></msqrt></mrow></mfrac><mo
 form="prefix" class="csname">exp</mo> <mfenced
 open='[' close=']' ><mfrac><mrow ><msup ><mrow
 ><mo>(</mo><mi>&#x03B5;</mi> <mo>&#x2212;</mo> <munderover
 accent='true'><mrow ><mi>&#x03B5;</mi></mrow><mrow ></mrow><mrow
 ><mo>&#x0304;</mo></mrow></munderover><mo>)</mo></mrow><mrow
 ><mn>2</mn></mrow></msup ></mrow>     <mrow
 ><mn>2</mn></mrow></mfrac>            <mfrac><mrow
 ><mi>&#x03BA;</mi></mrow> <mrow ><msup
 ><mi>&#x03BE;</mi><mrow ><mn>2</mn></mrow></msup ><mrow
 ><mo>(</mo><mn>1</mn> <mo>&#x2212;</mo> <msup
 ><mi>&#x03B2;</mi><mrow ><mn>2</mn></mrow></msup
 ><mo>/</mo><mn>2</mn><mo>)</mo></mrow></mrow></mfrac></mfenced></mtd><mtd
 class="eqnarray-2"> </mtd><mtd class="eqnarray-3"> </mtd><mtd
 class="eqnarray-4"> <mtext class="eqnarray"></mtext></mtd> </mtr></mtable>
 </math></formula>
 thus implying   </p></div>

The PassiveTeX Output:

[Picture]

LaTeX Foundations for TeX4ht

TeX4ht as a layer built on top of LaTeX:

   foo.tex----------------------|
   |############################|
   \usepackage##################|   -----------------|
 -||############################|   |                |
|- |####{amsmath}###############|---|                ----
|  |############################|   |  LaTeX         |   foo.dvi
|| |############################|   -----------------|
 | |############################|
 --|----------------------------|
  --||||-----|||||||||||||||--||||||||||||||||||     LaTeX
   |--------------|--------------|--------------||||--|||-------|
   |##############|##############|###############|######•#######|
   -------------------------------------------------------------
      article         book          amsart       amsmath

User Interfaces

                 ------------------|
                 |                 |
            -----|                 |------  XML
foo.tex          |   TeX4ht        |            +
                 -------------------      MathML

Input: Standard LaTeX source file foo.tex

Output: Configurable, with a few built-in configurations provided.

command line output file output format



xhmlatex    foo foo.html XHTML + MathML
teimlatex    foo foo.xml TEI + MathML
dbmlatex    foo foo.xml DocBook + MathML
xhlatex    foo foo.html XHTML



pdflatex    foo foo.pdf PDF
latex    foo foo.dvi DVI

Setting User Interfaces

A command line invokes LaTeX with appropriate configuration files.

xhmlatex mathml.4ht + html4.4ht + html4-mml.4ht
teimlatex mathml.4ht + tei.4ht + tei-mml.4ht
dbmlatex mathml.4ht + docbook.4ht + docbook-mml.4ht
xhlatex html4.4ht + html4-math.4ht
                                        |----------------------------
                                        |###########################|
       -                                \usepackage#################|
       |mathml                    ||||--|###########################|
       |-html4    - math       -||||--- |####{amsmath}##############|
       ||-docbook            ---------  |###########################|
       |||-                 ---- --  |  |###########################|
       ||||xhtml           ------   ||  |###########################|
       ||||-ebook        -- ----    |   |###########################|
       |||||-           -- ----     -   -----------------------------
       ||||||tei      ################
       |||||--||-     xhmlatex--------
    ---||---|||||    -- ---         -
   ---------|||||    - ---|-------  --     |-------|
   |#######-|||||   - ----|######|   -     |#######|article
   ---------|||||  ---- - --------   --    --------|
   ---------||||| ---- || --------    --   --------|book
   |#######|||-||----  || |######|     --  |#######|
   ---------||•-----    --|-------      --||-------|amsart
   |#######•|--|---       ||-•###|         |||•-###|amsmath
   ----------  ---        --------         --------
            --|-
configu     -            hooks           LaTeX

 rations

Seeding Hooks

Native definition

\def\frac#1#2{{\begingroup#1\endgroup\over#2}}

Modified definition

\def\frac#1#2{{\a:frac \begingroup#1\endgroup\b:frac \over\c:frac #2\d:frac}}

Declared hooks

\NewConfigure{frac}{4}

Configuring Hooks

\frac{A}{B}


\Configure{frac}
    {} { / } {} {}
A / B


\Configure{frac}
    {\HCode{<mfrac><mrow>}}
    {\HCode{</mrow>}}
    {\HCode{<mrow>}}
    {\HCode{</mrow></mfrac>}}
<mfrac>
  <mrow>
     A
  </mrow>
  <mrow>
     B
  </mrow>
</mfrac>


\Configure{frac}
    {\HPage{numerator}} {\EndHPage{}/}
    {\HPage{denumerator}} {\EndHPage{}}
numerator
/
denumerator


\Configure{frac}
    {\Picture+{}\bgroup} {}
    {} {\egroup\EndPicture}
bitmap

Configuring Beyond Hooks

Reversibility

Tasks for Postprocessors

Limitations

• Logical structures may pass through without markup, or with improper markup, if their definitions are not configured for TeX4ht.

• Content MathML requires new user-friendly TeX notation, or a postprocessor.

• The native TeX math model allows abuses, and might be in conflict with MathML.

$$\vbox{...}$$ unclear intention: math? center?


$R=\{x|x$ is real $\}$ ‘broken’ math

Fixes (typically) are:

• TeX4ht benefits enormously from its access to

but could have been marginally improved from the addition of a few features to the TeX compiler:

Don Knuth noticed the issue of subscripts and superscripts already in 1986 [4].

Historic Perspective

LaTeX ==>
  1. BibTeX2HTML, Gellmu, HTMX, HTeX, Hevea, HyperLaTeX, HyperTeX, LaTeX2HTML, LaTeX2hyp, Ltoh, Ltx2x, Math2HTML, TechExplorer, TeX2HTML, TeX2RTF, Texi2HTML, Tth, Vulcanize, WebEq
  2. Elsevier LaTeX2SGML, TeX4ht
  3. MicroPress TeXpider, Omega
==> *ML
  1. Stand alone systems (limited to small subset of LaTeX)
  2. Seeding dvi, and extracting text
  3. Alternating the TeX engine + new primitives (of very little value, without TeX4ht-like configurations; non-portable)

PassiveTeX

A macro library in TeX for typesetting XSL Formatting Objects.

  1. Convert the XML code to XSL-FO
                                  ------------------|
TEI     +  MathML         ----|                 |    XSL    - FO
                              |    XSLT         ----       +
               tei.xsl    ----|processor        |
                              -------------------    MathML

    The MathML <math> elements pass through unchanged

  2. Process the MathML embedded XSL-FO file with XMLTeX, under the PassiveTeX macros.
                                         -----------------|
XSL    - FO    +  MathML         ----|                |
                                     | XMLTeX         ---- pdf
fotex.fmt      +   fotex.sty     ----|                |
                                     -----------------|

    The presentation Typesetting MathML with XMLTeX of David Carlisle discusses XMLTeX

References

[1]   David Carlisle, XmlTeX: A non validating (and not 100% conforming) namespace aware XML parser implemented in TeX, ftp://ftp.tex.ac.uk/tex-archive/macros/xmltex/base/manual.html.

[2]   Michel Goossens and Sebastian Rahtz with Eitan M. Gurari, Ross Moore, and Robert S. Sutor, The LaTeX Web Companion: Integrating TeX, HTML, and XML, Addison Wesley, 1999, ISBN 0-201-43311-7.

[3]   Eitan M. Gurari, TeX4ht: LaTeX and TeX for hypertext, http://www.cis.ohio-state.edu/~gurari/TeX4ht/mn.html.

[4]   Donald Knuth, Personal communication with Sebastian Rahtz regarding the Elsevier LaTeX to SGML conversion, 1986.

[5]   Sebastian Rahtz, PassiveTeX, http://users.ox.ac.uk/~rahtz/passivetex/.