<- Virtual Exhibitions in Informatics

TEX


ΤΕΧ, is a typesetting system created by Donald Knuth. TeX is the abbreviation of τέχνη (ΤΕΧΝΗ – technē), Greek for "art" and "craft", which is also the source word of technical. English speakers often pronounce it "tek", like the first syllable of technology. When the first volume of Knuth's The Art of Computer Programming was published in 1969, it was typeset using monotype, a technology from the 19th century which produced a "good classic style" appreciated by Knuth. When the second edition of the second volume was published, in 1976, the whole book had to be typeset again because the Monotype technology had been largely replaced by photographic techniques, and the original fonts were not available anymore. However, when Knuth received the galley proofs of the new book on 30 March 1977, he found them awful. Around that time, Knuth saw for the first time the output of a high-quality digital typesetting system, and became interested in digital typography. The disappointing galley proofs gave him the final motivation to solve the problem at hand once and for all by designing his own typesetting system. On 13 May 1977, he wrote a memo to himself describing the basic features of TeX. Together with the Metafont language for font description and the Computer Modern typeface, it was designed with two main goals which were allowing anybody to produce high-quality books using a reasonable amount of effort, and providing a system that would give the exact same results on all computers, now and in the future. It is free and is popular in academia, especially in the mathematics, physics, computer science, and engineering communities. It has largely displaced Unix troff, the other favored formatter, in many Unix installations, which use both for different purposes. TeX is generally considered to be the best way to typeset complex mathematical formulas but especially in the form of LaTeX and other template packages, is now also being used for many other typesetting tasks.

 

A new version of TeX, rewritten from scratch and called TeX82, was published in 1982. Among other changes, the original hyphenation algorithm was replaced by a new algorithm written by Frank Liang. TeX82 also uses fixed-point arithmetic instead of floating-point, to ensure reproducibility of the results across different computer hardware, and includes a real, Turing-complete, programming language, following intense lobbying by Guy Steele.

 

The base TeX system understands about 300 commands, called primitives. However, these low-level commands are rarely used directly by users, and most functionality is provided by format files. Knuth's original default format, which adds about 600 commands, is HYPERLINK "http://en.wikipedia.org/w/index.php?title=Plain_TeX&action=edit" \o "Plain TeX" Plain TeX. The most widely used format is LaTeX, originally developed by Lamport, which incorporates document styles for books, letters, slides, etc, and adds support for referencing and automatic numbering of sections and equations. Another widely used format, AMS-TeX, is produced by the American Mathematical Society, and provides many more user-friendly commands, which can be altered by journals to fit with their house style. Most of the features of AMS-TEX can be used in LaTeX by using the AMS "packages". This is then referred to as AMS-LATEX. Other formats include ConTeXt, used primarily for desktop publishing and written mostly by Hans Hagen at Pragma. TeX commands commonly start with a backslash and are grouped with curly braces. However, almost all of TeX's syntactic properties can be changed on the fly which makes TeX input hard to parse by anything but TeX itself. TeX is a macro- and token-based language: many commands, including most user-defined ones, are expanded on the fly until only unexpandable tokens remain which get executed. Expansion itself is practically side-effect free. Tail recursion of macros takes no memory, and if-then-else constructs are available. This makes TeX a Turing-complete language even at expansion level.

The system can be divided into four levels: in the first, characters are read from the input file and assigned a category code. Combinations of a backslash followed by letters or a single other character are replaced by a control sequence token. In this sense this stage is like lexical analysis, although it does not form numbers from digits. In the next stage, expandable control sequences are replaced by their replacement text. The input for the third stage is then a stream of characters and unexpandable control sequences. Here characters get assembled into a paragraph. TeX's paragraph breaking algorithm works by optimizing breakpoints over the whole paragraph. The fourth stage breaks the vertical list of lines and other material into pages. The TeX system has precise knowledge of the sizes of all characters and symbols, and using this information, it computes the optimal arrangement of letters per line and lines per page. It then produces a DeVice Independent file containing the final locations of all characters. This dvi file can be printed directly given an appropriate printer driver, or it can be converted to other formats. Nowadays, PDFTeX is often used which bypasses DVI generation altogether.