This document gives a tutorial on how to type texts in ATF, the ASCII Transliteration Format used by the Cuneiform Digital Library Group for data capture and archiving.
We provide a simple introduction to typing ATF texts, describing the more common features first and filling in the details later. Before explaining any specifics, here is a simple typical example of an ATF text:
&P555555 = Some Publication 32 @obverse 1. 1(disz) udu ba-ug7 $ reverse blank
This example illustrates the four of the most common types of lines in an ATF text:
&-lines
("and-lines") which normally gives the CDLI P-identifier and should
also have a human-readable name following it after an '=' sign.@ sign ("at-lines"). These are used to indicate object
types, surfaces, divisions and columns.$-lines ("dollar lines"). These
look like ordinary sequences of words but they may be subject to
strict rules.The other quite common type of line in an ATF file begins with the
hash sign (#). There are two kinds of #-line: protocols
and comments.
Protocols are statements which are interpreted or stored by the ATF processor but are not part of the text edition proper. Protocols are all named and may trigger special processing within the ATF processor.
Protocols are indicated in ATF by a line beginning with the hash
character (#), a known protocol name and a colon character
(:).
The details of protocols are beyond the scope of this tutorial; for now, it is enough to know that they look like this:
&P123321 = Some Akkadian Text #atf: lang akk 1. i-na AN-e #note: This is a contrived note.
Most protocols are a single line and do not require
a blank line after them to separate them from a following protocol
(the one exception is #note:).
More information on protocols, what they are, where they are allowed and the rules about ordering of protocols is available in the protocols manual.
Comments are asides which are not part of the text edition or the annotation; they are useful for keeping odd bits of information in the file without it getting in the way of the text edition or annotation.
Comments are indicated in ATF by one or more lines beginning with
the hash character (#).
Comments look like protocols in that they begin with a hash-sign, but they may not begin with the sequence hash-name-colon. Comments may be included within text transliterations but not before the first text in a file. Comments must always follow any protocols which occur adjacent to them.
A sequence of lines beginning with hash-signs is a multi-line comment. To separate multiple comments to the same line use a blank line in the ATF file.
1. a #a simple comment 2. a #a longer comment which somewhat artificially extends #over multiple lines 3. a #one comment to line 3. #another comment to line 3. 4. a #Comments look a bit like protocols but there is no chance of #confusion: the ATF processor's scanning rules take care of that. 5. a #lem: a[water] #note: If you want a comment to appear in the displayed text-edition #use the '#note:' protocol instead. #and note that any comment must follow any other protocol, including #'#note:'.
&-lines are used to introduce a new text and consist of two parts: the ID and the name.
For transliterations of exemplars, the ID is a 'P' followed by six digits, e.g., P123456. This ID is assigned by CDLI and is the reference ID of the object in the main CDLI catalog; to get IDs for objects not in the CDLI catalog send an e-mail to cdli@cdli.ucla.edu.
The name of the text should be identical with the 'Designation' field in the CDLI main catalog; the ATF processor detects mismatches and reports the correct name. This mechanism is designed to provide a check that the P-number in the ID actually references the text the transliterator intends.
In ATF the two parts of an &-line are separated by space-equals-space, like this:
&P000001 = ATU 3, pl. 011, W 6435,a
@-lines are used for structural tags. Several kinds of structure
may be indicated using this mechanism: physical structure, e.g., objects,
surfaces; manuscript structure, i.e., columns; and document structure,
e.g., divisions and colophons. For clarity, we describe here only the
structural features which are permitted in object transliterations,
i.e., texts with an ID beginning with P. Documentation
of structural conventions for composite texts is given in the composites manual.
The kind of object on which the inscription being transliterated is written is designated using one of the following tags:
@tablet@envelope@prism@bulla@fragment&P212121 = Some Fragmentary Object @fragment a 1. a @fragment b 1. a
@object@object Stone wig.A transliteration of the text inscribed on a physical seal object
should be handled using the @object tag:
&P333444 = Some Seal @object seal 1. da-da 2. dumu du-du
Surfaces are principally the physical surfaces:
@obverse,
@reverse@left,
@right,
@top,
@bottom@face&P123321 = Some Prism @prism @face a 1. a @face b 1. e
@surface@surface shoulder; @surface side a.@edge@face.A transliteration of a sealing should be handled using the
@seal tag included like a surface after the
transliteration of the object on which the sealing occurs:
&P343434 = Some Sealed Tablet 1. a $ seal 1 @seal 1 1. du-du
The use of $ seal anticipates the discussion of
$-lines below; this mechanism can be used to indicate which sealings
occur where on an object.
Columns are indicated with the @column tag, which may
be omitted for single-column texts. Column numbers must be given in
arabic numerals:
&P545454 = Some Columnar Text @column 1 1. a @column 2 1. e
The status of some of the features indicated with @-lines can be indicated in a manner similar to that of graphemes; the notation is intended to be natural and to follow Assyriological conventions:
@obverse?
Meaning: status of obverse/reverse uncertain
@reverse!*
Meaning: collated; reverse correct despite designation in publication
Primes can be used where this makes sense:
@face a' @column 3'
For technical reasons it is impossible to interweave physical structure (of the kind described above for transliterated objects) and document structure (e.g., paragraph divisions). This limitation is resolved by recourse to milestones.
Documentary divisions in a transliterated object are given using
the @m tag, with the milestone type given after an equals
sign and the division type following; an optional division name or
number may follow the division type:
@m=division paragraph 1 @m=division colophon
Simple support for discourse elements in administrative texts is
provided using shorthands which are also implemented as
milestones. These shorthands are @date,
@summary,
@witnesses:
&P787878 = Some Administrative Text 1. 1(disz) udu 2. da-da 3. szu ba-ti @date 4. u4 1-kam @left @summary 1. 1(disz) udu
$-lines are used to indicate information about the state of the text or object, or to describe features on the object which are not part of the transliteration proper. They come in two flavours: strict and loose.
Strict $-lines are subject to the restrictions in the table below; strict $-lines can be interpreted in their entirety by the ATF processor and the interpreted information can then be used by other programs. Strict $-lines are the best practice.
Loose $-lines are indicated by putting parentheses around the contents of the $-line. This is a facility provided to enable annotation of features which are not covered by the strict $-line specification. If the ATF processor detects that a loose $-line actually meets the criteria defined for strict $-lines it gives an advisory notice that the parentheses should be removed.
$-lines and comments are two quite different facilities, but experience has shown that transliterators can confuse the two. Comments are for information which does not belong in the transliteration and description of the text; comments are not displayed when the text is formatted for display or print. $-lines are for information which is integral to an understanding of the textual data; $-lines are included when the text is displayed or printed.
A particular use of $-lines is to indicate that a seal is used on an object; the form is:
$ seal <N>
Where N is a number indicating which seal is used;
if a transliteration of the seal is also given using the
@seal heading, the number following $ seal
should correspond to the number following @seal. See the
example above.
Most $-lines are used to give information about the state of the object being transliterated. The conventions for this can be summarized as follows:
| Qualification | Extent1 | Scope | State | |
|---|---|---|---|---|
1The extent N may be a
number such as 1 or 5; a RANGE gives two numbers
separated by a hyphen, e.g., 3-5. | ||||
2OBJECT is any object
specifier as described above, e.g., tablet, object etc. | ||||
3SURFACE is any surface
specifier as described above, e.g., obverse, left etc. | ||||
|
at least at most about |
n several some NUMBER RANGE rest of start of beginning of middle of end of |
OBJECT2 SURFACE3 column columns line lines case cases surface |
blank broken effaced illegible missing traces |
|
$-lines are also used to indicate noteworthy rulings on the tablet; ordinary case- or line-ruling should not be indicated with a $-line, but where a scribe has used a ruling to give additional information about the document structure this should be noted as:
(single | double | triple) ruling
Strict $-lines look like this:
$ 3 lines blank $ rest of obverse missing
A loose $-line looks like this:
$ (head of statue broken)
A ruling $-line looks like this:
$ double ruling
Inline images can be specified using the form:
$ (image N = <text>)
Where N is an image number consisting of digits followed by optional lowercase letters from a to z, and <text> is free text, giving a label for the image (which is copied through to the XHTML 'alt' attribute on the <img> tag).
$ (image 1 = numbered diagram of triangle)
At present, the implementation only works for XHTML which is
produced within a project. The ATF processor constructs a file name
consisting of the text ID and the image's N value, joined by an at
sign (e.g., P123456@1). The XHTML producer then emits an
<img> tag with the src attribute set to
/<PROJECT>/<FILENAME>.png.
Thus, in the present implementation, there must exist an
appropriately named file in the PNG graphics format residing in the
project's images directory. The implementation is
expected to support a more sophisticated locator mechanism in the
future.
Lines of transliterated text begin with a sequence of non-space characters followed by a period and a space (these are typically numbers, but that is not a requirement):
1. a a+1. e 2'. i
In ATF, lines containing only spaces are ignored; lines beginning with a space are continuation lines and the newline and leading spaces are dropped by the ATF processor:
1. a a a a a a a
The content of lines is defined principally by the Grapheme Description Language, but there are some line-related ATF features which are not necessary for many users and which are dealt with in the advanced documentation.
Questions about this document may be directed to Steve Tinney (stinney at sas dot upenn dot edu).