atfsplit.plx

Steve Tinney (stinney@sas.upenn.edu)

CONTENTS


NAME

atfsplit.plx -- split up ATF files into their constituent PQ-files


SYNOPSIS

atfsplit.plx [options] [file]


OPTIONS

-base dir
Use dir as the base into which to split the files. By default, this is D000, D001, etc.

-cat
Spool the output straight onto STDOUT like unix cat does. Use with -list to extract a sub-corpus from a bigger file.

-dir
Create the files in 'dir', which is appended to 'base' if given. If you want to split the files into the current directory with no subdirectories use '-dir .'. If the dir name ends in a digit, it is incremented every thousand files (similar to the default behaviour with the dir name D000, D001, D002 etc.).

-dryrun
Just print the names of the files which would be generated; don't create any files.

-except
Use with -list; output everything except the texts given in the list.

-install
Install the individual PQ-files into the cdl/texts tree.

-list filename
Read a list of P/Q IDs from filename and output only those texts.

-shallow
When building pathnames do not include mid-level directories of the form P/P000xxx, Q/Q100xxx etc.

-show-updates
Produce a list of updated texts.

-update
Only produce the ATF file for a text if the current version is different from what is in the archive being split.

-verbose
Print the names of files as they are generated.


DESCRIPTION

atfsplit reads a file which may contain more than one transliteration and splits it up into one file per transliteration. The output is grouped in directories containing at most 1000 files each, the subdirectories being named D000, D001, etc. With the -install option the files are split directly into the cdl/texts tree.


AUTHOR

Steve Tinney (stinney@sas.upenn.edu)


COPYRIGHT

Copyright(c) Steve Tinney 2004.

Released under the Gnu General Public License (http://www.gnu.org/copyleft/gpl.html).