[See also bugs.txt] 2006/01/02 ~~~~~~~~~~~~ Before tacking the task of writing the selector of "features" I've decides to stabilize a few things. The otp part has been rewritten from scratch and now it works better. The macros has been redesigned, too: - \DeclareLanguageProcessGroup is like the previous \DeclareLanguageProcess - \AddLanguageProcess has changed radically. Now is a set of "slots" for specific tasks; each task can be assigned to a process with \SetLanguageProcess (or left empty). - \DeclareProcess groups several physical OTPs as a single logical process (eg, in t1.ed through \SetEncodingProcess). - \SelectProcesses, select the current processes defined with the previous commands. - \ShowProcesses shows the current processes (for debugging) As before, commands replacing "Language" by "Mem" refer to processes not assigned automatically to a language. For example: \DeclareMemProcessGroup{1000}{case} % new group \AddMemProcess{case}{case} % new task % By default, it does a \SetMemProcess{case}{case} % but this is not what we want, so: \SetMemProcess{case}{} \DeclareRobustCommand{\MakeUppercase}{\mem@uppercase} \DeclareRobustCommand{\MakeLowercase}{\mem@lowercase} \providecommand\mem@uppercase[1]{% {\SetMemProcess{case}{uppercase}\SelectProcesses#1}} \providecommand\mem@lowercase[1]{% {\SetMemProcess{case}{lowercase}\SelectProcesses#1}} Finally, \DeclareLanguageProcess is an afterthought to allow unaccented uppercasing (see the French style). As Mem is still a work in progress, it traces by default how processes are built and selected (this is another new feature). A file named <lang-code><encoding>.id is loaded if it exists and the corresponding pair language/encoding exists. This is somewhat experimental as I think this is not the right way to translate macro names to the language. Being just an experiment, only a file is provided: espisolat1.id (used in yatest.tex). (Note. I've just upgraded to TeXLive2005 and it seems Greek fonts are not set up correctly. I'll try to fix my installation. -> 2006/01/25 Fixed. Now greek.pdf looks fine.) ======================================================== Introduction ~~~~~~~~~~~~~~~~~~~~~~ This is Mem, a multilingual environment for Lamed/Lambda. The name derives from the letter that comes after Lamedh --because Mem should go after Lamedh-- and from Multilingual EnvironMent. Its aim is to provide the possibility to write multilingual document and to provide a framework where new languages can be added easily by User Groups and/or developers interested in doing that. This package would no be possible without the previous work by Yannis Haralambous and John Plaice. Note at some places the name Lambda is still appears. I expect it will be removed soon. This package is not intended for real use but just to make tests. A selection of previous version of the readme file follows, which some modifications to reflect the latests changes. Changes in the previous release are marked with ****04. Many parts of the readme has been moved to the manual, too. Javier Bezos 2004/10/07 Requirements ~~~~~~~~~~~~ Currently Mem requires several ocp files from other sources, namely: - from Omega: upppercase, lowercase, cuni2oar. The latter apperently mixes contextual analysis and font encondins, but until a better otp is devised it can be used to test Arabic. ========================================================= Some remarks. Firstly of all, will it work? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Well, some parts will and some other will not. For example, automatic selection of fonts is still at a very early stage (to be generous) and it will not be correctly synchronized with runnings heads. ****04 There is a first experiment with bidirectional writing named arabic.tex. However, apparently Aleph does not reverses the direccion in all elements properly. This requires further investigation. See below. I would like to note that I implemented that as fast as possible in order to have a working package in Tsukuba. The resulting code is somewhat chaotic and unstable (and sometimes naive), but I'm fixing it and hope it will be enough to begin to do simple experiments. Files ~~~~~ As you can guess, mem.sty is the kernel of the system. There are files describing languages, named with the ISO three letter code (esp.ld, eng.ld, fra.ld, ell.ld), and files describing scripts, named with the ISO two letter code (la.sd and el.sd). Regarding TeX, there is a further file with the configuration of the system: mem.cfg. ****04 added rus.ld, ara.ld, ar.sd and cy.sd, uncomplete. Then come otp files. isolat1, isoell, macstd, etc., can escape to utf8 and ucs16. However, after experimenting a little, escaping to utf8 is fairly complicated with arbitrary text. ****04 Added a script to generate these files, inputtex defines TeX input conventions. fratext defines (visual) text transformation for French. The files for Greek are those by Yannis and John with new names beginning with ell: this is a proposal to systematize names. OT1.otp, T1.otp and LOM.otp provides tranlation from Unicode to the corresponding font encodings. They are very quick and dirty, and in fact T1 is the same file than LOM with a few lines added! Accents above work and may be stacked, accents below [...] ****04 Accents below now work with some limitations: up to three accents in total, max. 2 above, max. 1 below. This is enough for most cases, but in a future it should be improved. Finally, a little package named spguill adds spaces before and after guillemets in non French text. It requires spguill.otp and demostrates the possibilities of the scheme. mem.tex explains most of macros, but there are some of them which are not documented yet. However, I think their names are mostly self-explanatories. Samples ~~~~~~~ greek.tex contains both French and Greek text. The Greek text has been taken from the Greek TeX Group, so in addition you will learn how to become member of it :-). You should note that \MakeUppercase doesn't work correctly at some places (eg., the running head with French text should be unaccented; the problem here is pretty simple: when \MakeUppercase is called it does not know that the corresponding ocp will be changed by french. Thus, \frenchtext must see in a future a "case status" set by \MakeUppercase and behave accordingly). Only modern monotonic Greek! yatest.tex prints the date in Spanish, English and US English, ****04 yatest has some additinal tests with ligatures and accents. testmisc.tex contains miscelaneous tests. spguill.tex provides an example for spguill. *****04 russian.tex demonstrates how encoding selection works (LOM/omlgc vs. T2A/cmr) and how to transliterate from Latin to Cyrillic. ****04 arabic.tex shows bidirectional text, but unfortunately the bidirectional mechanism of Omega and Aleph is problematic. In the sample you can see page layout (including elements like sections and lists) are not handled properly by Aleph, particularly because you cannot change the direction to mix Arabic and English sections. Python scripts ****04 ~~~~~~~~~~~~~~ I'm using Python scripts to automatically perform some tasks. I think it would be useful for the TeX community to make available these scripts. charset2otp.py creates ocp files for serveral input encodings. mtp2ocp.py is like otp2ocp but replaces on the fly spacial characters by characters in the PUA area with special catcodes. Eg, \ becomes @"F000 whose catcode is set by mem.sty to 0 (escape). Random remarks ~~~~~~~~~~~~~~ - Scripts will have a default dummy language. This way, specific actions for this script are possible even if the main language uses a different script. - Currently languages only have one script. However, some languages can be written with several scripts (eg, Azeri [Latin, Arabic, Cyrillic] or Spanish [Latin, Hebrew]). - I'm now studying how to accomplish macros depending on scripts, namely for fonts, case, and so on. - I'm studying as well how to replace the two level system by a three level one (document, paragraph/block, text). - Many "auxiliary" files are far from complete. In fact, they are fairly uncomplete, but I will continue adding more code only when we had decided the "right way". - Currently, the code includes some experiments I've done, mainly: - Automatic selection of font encoding based on fd files--if there is an fd file for some combination then select it (with certain preferences). Hovever, it turns out that t1cmr exists but pointing to _another_ font, and that ot1omlgc points to an ut1 encoded font. - An escaping mechanism in input encoding otp's, which will allow to enter Unicode text (ucs16 or utf8) without changing the current ocp list (otherwise ligatures and kerning could be killed). It works fine when applied that to a single char, but I didn't manage to extend it to arbitrary text (including non expandable primitives--ocp states are not saved). - There are lots of open questions, and no doubt they will appear when discussing Mem. ___________________________________________________________ Javier Bezos | TeX y tipografia jbezos at wanadoo dot es | http://perso.wanadoo.es/jbezos/ ........................................................... CervanTeX http://www.cervantex.org