[See also bugs.txt]

2006/01/02
~~~~~~~~~~~~

Before tacking the task of writing the selector of
"features" I've decides to stabilize a few things.
The otp part has been rewritten from scratch and now
it works better. The macros has been redesigned, too:
- \DeclareLanguageProcessGroup is like the previous
  \DeclareLanguageProcess
- \AddLanguageProcess has changed radically. Now is 
  a set of "slots" for specific tasks; each task can
  be assigned to a process with \SetLanguageProcess
  (or left empty).
- \DeclareProcess groups several physical OTPs as a
  single logical process (eg, in t1.ed through
  \SetEncodingProcess).
- \SelectProcesses, select the current processes
  defined with the previous commands.
- \ShowProcesses shows the current processes (for
  debugging)
As before, commands replacing "Language" by "Mem" refer
to processes not assigned automatically to a language.

For example:
\DeclareMemProcessGroup{1000}{case} % new group

\AddMemProcess{case}{case} % new task

% By default, it does a \SetMemProcess{case}{case}
% but this is not what we want, so:

\SetMemProcess{case}{} 

\DeclareRobustCommand{\MakeUppercase}{\mem@uppercase}
\DeclareRobustCommand{\MakeLowercase}{\mem@lowercase}

\providecommand\mem@uppercase[1]{%
  {\SetMemProcess{case}{uppercase}\SelectProcesses#1}}
\providecommand\mem@lowercase[1]{%
  {\SetMemProcess{case}{lowercase}\SelectProcesses#1}}

Finally, \DeclareLanguageProcess is an afterthought to
allow unaccented uppercasing (see the French style).

As Mem is still a work in progress, it traces by default
how processes are built and selected (this is another new
feature).

A file named <lang-code><encoding>.id is loaded if it
exists and the corresponding pair language/encoding
exists. This is somewhat experimental as I think this
is not the right way to translate macro names to the
language. Being just an experiment, only a file is
provided: espisolat1.id (used in yatest.tex).

(Note. I've just upgraded to TeXLive2005 and it seems Greek
fonts are not set up correctly. I'll try to fix my
installation.
-> 2006/01/25 Fixed. Now greek.pdf looks fine.)



========================================================

Introduction
~~~~~~~~~~~~~~~~~~~~~~

This is Mem, a multilingual environment for Lamed/Lambda.
The name derives from the letter that comes after Lamedh
--because Mem should go after Lamedh-- and from Multilingual
EnvironMent. Its aim is to provide the possibility to write
multilingual document and to provide a framework where new
languages can be added easily by User Groups and/or
developers interested in doing that.

This package would no be possible without the previous work
by Yannis Haralambous and John Plaice.

Note at some places the name Lambda is still appears. I
expect it will be removed soon.

This package is not intended for real use but just to
make tests.

A selection of previous version of the readme file follows,
which some modifications to reflect the latests changes.
Changes in the previous release are marked with ****04. 
Many parts of the readme has been moved to the manual, too.

Javier Bezos
2004/10/07

Requirements
~~~~~~~~~~~~

Currently Mem requires several ocp files from other
sources, namely:
- from Omega: upppercase, lowercase, cuni2oar. The
  latter apperently mixes contextual analysis and
  font encondins, but until a better otp is devised
  it can  be used to test Arabic.
  
=========================================================

Some remarks.

Firstly of all, will it work?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Well, some parts will and some other will not. For example,
automatic selection of fonts is still at a very early stage
(to be generous) and it will not be correctly synchronized
with runnings heads.

****04 There is a first experiment with bidirectional
writing named arabic.tex. However, apparently Aleph does
not reverses the direccion in all elements properly.
This requires further investigation. See below.

I would like to note that I implemented that as fast as
possible in order to have a working package in Tsukuba.  The
resulting code is somewhat chaotic and unstable (and
sometimes naive), but I'm fixing it and hope it will be
enough to begin to do simple experiments.  

Files
~~~~~

As you can guess, mem.sty is the kernel of the system.
There are files describing languages, named with the
ISO three letter code (esp.ld, eng.ld, fra.ld, ell.ld),
and files describing scripts, named with the ISO two letter
code (la.sd and el.sd). Regarding TeX, there is a further
file with the configuration of the system: mem.cfg.

****04 added rus.ld, ara.ld, ar.sd and cy.sd, uncomplete.

Then come otp files. isolat1, isoell, macstd, etc., can
escape to utf8 and ucs16. However, after experimenting a little,
escaping to utf8 is fairly complicated with arbitrary text.

****04 Added a script to generate these files, 

inputtex defines TeX input conventions.  fratext defines (visual) text 
transformation for French.  The files for Greek are those by Yannis 
and John with new names beginning with ell: this is a proposal to 
systematize names.

OT1.otp, T1.otp and LOM.otp provides tranlation from
Unicode to the corresponding font encodings. They are very
quick and dirty, and in fact T1 is the same file than LOM with
a few lines added! Accents above work and may be stacked, accents
below [...]

****04 Accents below now work with some limitations:
up to three accents in total, max. 2 above, max. 1 below.
This is enough for most cases, but in a future it
should be improved.

Finally, a little package named spguill adds spaces before
and after guillemets in non French text. It requires 
spguill.otp and demostrates the possibilities of the scheme.

mem.tex explains most of macros, but there are some of them
which are not documented yet. However, I think their names
are mostly self-explanatories.

Samples
~~~~~~~

greek.tex contains both French and Greek text. The Greek text
has been taken from the Greek TeX Group, so in addition you will
learn how to become member of it :-). You should note that
\MakeUppercase doesn't work correctly at some places (eg., the
running head with French text should be unaccented; the problem here 
is pretty simple: when \MakeUppercase is called it does not know that 
the corresponding ocp will be changed by french.  Thus,
\frenchtext must see in a future a "case status" set by \MakeUppercase
and behave accordingly). Only modern monotonic Greek!

yatest.tex prints the date in Spanish, English and US
English,

****04 yatest has some additinal tests with ligatures and
accents. 

testmisc.tex contains miscelaneous tests.

spguill.tex provides an example for spguill.

*****04
russian.tex demonstrates how encoding selection works
(LOM/omlgc vs.  T2A/cmr) and how to transliterate from Latin
to Cyrillic.

****04
arabic.tex shows bidirectional text, but unfortunately the
bidirectional mechanism of Omega and Aleph is problematic.
In the sample you can see page layout (including
elements like sections and lists) are not handled properly
by Aleph, particularly because you cannot change the direction
to mix Arabic and English sections.

Python scripts  ****04
~~~~~~~~~~~~~~

I'm using Python scripts to automatically perform some
tasks. I think it would be useful for the TeX community
to make available these scripts.

charset2otp.py creates ocp files for serveral input
encodings.

mtp2ocp.py is like otp2ocp but replaces on the fly
spacial characters by characters in the PUA area with
special catcodes. Eg, \ becomes @"F000 whose catcode
is set by mem.sty to 0 (escape).

Random remarks
~~~~~~~~~~~~~~

- Scripts will have a default dummy language. This way, specific
actions for this script are possible even if the main language
uses a different script.
- Currently languages only have one script. However, some languages
can be written with several scripts (eg, Azeri [Latin, Arabic, Cyrillic]
or Spanish [Latin, Hebrew]).
- I'm now studying how to accomplish macros depending on scripts,
namely for fonts, case, and so on.
- I'm studying as well how to replace the two level system by a
three level one (document, paragraph/block, text).
- Many "auxiliary" files are far from complete. In fact, they are
fairly uncomplete, but I will continue adding more code only when
we had decided the "right way".
- Currently, the code includes some experiments I've done, mainly:
   - Automatic selection of font encoding based on fd files--if there
     is an fd file for some combination then select it (with certain
     preferences). Hovever, it turns out that t1cmr exists but pointing
     to _another_ font, and that ot1omlgc points to an ut1 encoded
     font. 
   - An escaping mechanism in input encoding otp's, which will allow
     to enter Unicode text (ucs16 or utf8) without changing the
     current ocp list (otherwise ligatures and kerning could be
     killed). It works fine when applied that to a single char,
     but I didn't manage to extend it to arbitrary text (including
     non expandable primitives--ocp states are not saved).
- There are lots of open questions, and no doubt they will appear
when discussing Mem.

___________________________________________________________
Javier Bezos              | TeX y tipografia
jbezos at wanadoo dot es  | http://perso.wanadoo.es/jbezos/
...........................................................
CervanTeX   http://www.cervantex.org