otl
=====================

[[TOC]]


a brief otl overview
---------------------------

otl is a text processor for generating custom markup from plain text. otl supports complex structures such as nested ordered lists, headers and footers, and tables. The distribution includes configuration files for output in LaTeX or HTML. Other input and output formats can be specified by the user.  Thus, if desired, one can specify other output formats and/or other input formats (e.g., if the default input format (roughly similar to that of asciidoc) does not meet one's needs).

In otl, conversion utilizes perl regex, adding quite a bit of flexibility and power to the conversion process.

The primary goals driving otl development are to:

	- allow the user to compose a document in a plain text format which is user-defined and is highly readable
	- facilitate conversion of the plain text document to a standard typesetting or web page markup language
	- allow inclusion of XML tags in the original document


executing otl
----------------------------

invoking otl
~~~~~~~~~~~~~~~~~~
	 
###
	 otl options filename(s)
###

filenames
^^^^^^^^^^^^^^^

- name or names of files for otl to process, mandatory command-line argument
	
options
^^^^^^^^^^^^^^^^

--help 		- print a help message
	
--debug		- give verbose output
	
--descend	- descend into all subdirectories relative to current directory (where otl was executed) and execute otl on [source file(s)], if present, in each subdirectory (
	
	notes:
		- unless the filename is in single-quotes (e.g., 'filename.txt'), the shell will probably glob the filename
		- with --descend, if you want globbing in each directory, put the filename in single quotes; when --descend is set, otl will try to glob names passed from the command line in each directory if they contain the * or $ characters
	
--nopretty
	- don't call otlsub

--no-tags	- ignore lines beginning with XML tags
	
--pfile [parameter-file]
	- user-specified parameter file; if this argument isn't given, otl will look for ~/.otl/otl and, if not present, will write a default version of ~/.otl/otl to use for processing
		
--style <my-style>
	- allows the user to specify a stylesheet to use in processing the document. otl looks for <my-style>.css in ~/otl. If such a file is specified, its content is automatically added at the second-to-last position in the 'head' component of the file output by otl.
		
	note: a set of sample css files are included in the source code in the 'css' directory
	
		
customizing otl
---------------

Conversions from text to LaTeX or HTML are included with the otl distribution. However, otl is customizable: the source document obeys the syntax YOU specify and the output syntax is the syntax YOU specify. One example: if you want to write an XHTML document quickly, it's easier to type --joe-- than using XHTML tags with CSS information to specify underlining for joe. For that matter, it's easier to type --joe-- than to do several mouse clicks to get it underlined. With the default parameter file, otl will recognize the --joe-- and convert it to the appropriate XHTML.

Customizing things by editing the otl configuration file (by default, $HOME/.otl/otl). If you want to use BBB instead of -- to bracket items which should be rendered bold, just edit this line in the config file:

###
	--	|<span style="font-weight: bold;">|	|</span>|
###

Modify the line:

###
	BBB	|<span style="font-weight: bold;">|	|</span>|
###

Now, when otl processes your favorite file, "BBBjoebobBBB is big" will be converted to "<span style="font-weight: bold;">joebob</span> is big" (the "|" character in the ~/.otl/otl line is a reserved character to indicate the start and end of the markup which will be substituted in the place of the text used to bracket the item).

otl can be used for many types of conversions since not only the syntax of the source file but also the markup tags in the output are user-defined in the otl configuration file. For example, if you wanted to use troff instead of html as the output, edit the config file changing

###
	--	|<span style="font-weight: bold;">|	|</span>|
###
	to
###
	--	|\fB|	|\fR|
###

Now, "--joebob-- is big" will be converted to "\fBjoebob\fR is big".


otl uses perl regex
-------------------

The otl syntax is powerful and flexible since it is based on perl regular expressions but there are 'caveats'. In particular, exercise care with the perl metacharacters (such as \ | ^ . $ () [] {n} ? + * ).

For example, why not use "//" as a bracket to indicate italics? You'll get unexpected results with a line like "You should check out http://www.cnn.com for info. Sometimes http://www.slashdot.org also has good stuff." Why? Perl uses / as a component of the syntax for regex search and replace commands (e.g., $string =~ s/regex/replacement/g;).
	
A second example: using "[]" as a bracketing character set with a line in ~/otl/otl like
		
	bracket []	|<span style="font-variant: small-caps;">|	|</span>|

wouldn't work the way you may have expected since [ and ] are perl metacharacters.


how otl processes the input file
--------------------------------

otl processes the source/input file based on

	1. a fixed set of rules: items which otl always looks for
	2. the transformations specified in ~/.otl/otl (or another parameter file as specified on the otl command line). The format of ~/otl/.otl and the fixed set of rules are described below.


fixed rules
~~~~~~~~~~~~~~~~~~~~~

the NoProcess directive
^^^^^^^^^^^^^^^^^^^^^^^^^^

By default, the "##" string at the start of a line, followed only by whitespace, indicates that subsequent lines will not be processed until another line beginning with ## is reached. In other words, ## can be considered a "do not process" directive. The "##" must be at the very start of the line; a tab or spacing in front of it will cause it to be processed in the normal fashion rather than as a special directive.

Note that this should not be considered the equivalent of a 'comment' (use the markup-specific comment tags inline for comments).  Material between the two lines beginning with ## will be included in the output file verbatim. The "##" strings will be deleted. This facilitates inclusion of 'complicated stuff' in an input file (e.g., including MathML markup in a otl document being used to generate XHTML). The '##' directive allows the content to appear untouched in the output file (excepting the deletion of the "##" strings). This approach makes it easier to bracket large blocks of text (versus a situation where one would place ## at the start of every single line to be excluded from processing).

Note that the "##" is not simply the equivalent of the HTML "pre" tag. If you want the enclosed text to not be processed by otl ;;nor;; by the HTML parser (or tex parser or whatever...), you should use "##" followed by &lt;pre&gt; string or...
	

the VerbatimPre tag
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Synopsis: the equivalent of the HTML 'pre' tag or the LaTeX verbatim environment.
	
If a line begins with ### followed by only whitespace, the tags specified by the "preequivalent" line in the ~/.otl/otl file will be used (by default, a pair of HTML 'pre' tags or a \begin{verbatim} ... \end{verbatim} environment). The first tag will be used at the ###. The next tag will be used at the following ### (note that this ### must also be at the start of a line and succeeded only by whitespace on that line). 

In smart mode with an HTML or TeX document, otl will try to identify and deal with any "problem characters" between the ### strings.
	

line continuation
^^^^^^^^^^^^^^^^^^^^

@ at the end of a line indicates that that line should not be processed as a default line and that processing should jump to the next line. For example, if the line would have received a &lt;/p&gt; as the last line in a default paragraph, the next default line will now be processed in that fashion.
	- If the next line is not a default line, then the @ is ignored


ignoring xml tags
^^^^^^^^^^^^^^^^^^^^^^

otl will ignore xml tags already in place -- thus, it is straightforward to include links, etc. in your text. For example, a line such as

###
	The big dog likes to watch <a href="http://www.cnn.com">cnn</a>.
###
	would be processed by otl in the same fashion as otl would process any other line. Using the default HTML output, the line would be rendered

	The big dog likes to watch <a href="http://www.cnn.com">cnn</a>.


The feature of ignoring xml tags also makes it straightforward to include other html features. For example, the default configuration of otl doesn't include a 'comment' feature since 

###
<!-- my comment -->
###

 is straightforward to use.


special characters
^^^^^^^^^^^^^^^^^^^^
	
otl will try and look for special characters which appear to be used in such a fashion as might interfere with interpretation of a LaTeX or HTML document and will query the user regarding his or her intention with respect to these characters. 
Characters which are interpreted in a special fashion:

###
	   HTML: & < >  ...probably others...
	   tex: \ & $ % ^ _ { }  ...and probably others...
	   perl:  ^ \ + ? [ ] { } * ...and probably others...  
###


Thus, it is best to exercise care when including special characters in text or using special characters in definitions in the parameter file. For example, a parameter file line

###
under \^\^\^\^+	 |\paragraph{| 	  |}| 
###

probably will not work as expected for a tex file. perl will scan the file looking for a ^^^^^^^^^^^ series but will never encounter it. Why? If --assume-yes is used with TeX output, all caret characters in the source file will be converted to $\mathchar"1356$ before the document is scanned for under items.

Note that, although troublesome, the less than and greater than characters are not dealt with by otl when generating a html document. The reason for this is that since tags can span multiple lines, it becomes more difficult to identify whether a less-than or greater-than symbol is a part of a tag or should be converted to &lt; or &gt;. At this point, the user must use &amp;lt; and &amp;gt; in the source document if the intent is to print or render a less-than or greater-than symbol.


including a table of contents
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If supported (currently only implemented for tex; hopefully, html coming one of these days...)...

A line starting with [[TOC]] will trigger insertion of a table of contents into the text at that point. The remainder of the line is ignored.


tables
^^^^^^^^^^^^^
	
otl looks for a set of two more consecutive lines, each of which contains a series of at least two sequential spaces at a position following some sort of alphanumeric string material (and preceding additional alphanumeric string material?). The spaces are interpreted to represent breaks between columns and the set of lines is interpreted as a table. Tab characters are not interpreted as spaces. At this point, conversion is to HTML but, in the future, it should be straightforward to do this flexibly via the parameter file so tables in the source file can be converted to other markup languages.


the parameter file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
	
suffix
^^^^^^^^^^

otl uses the suffix of the output file, specified in the parameter file, to determine the language of the output file. If the file suffix is html, otl will look for URLs and convert them to HTML hyperlinks.

If you want otl to output a file with a specific suffix, specify that using a line beginning with "suffix" in the parameter file. For example, a line in the parameter file of

suffix	html

would cause "test.txt" to be processed and output to a file named "test.html". Note that only the first suffix line in the parameter file will be used - others will be ignored. The suffix must consist only of letters and/or numbers.


bracketed text
^^^^^^^^^^^^^^^^

Text bracketed by a set of symbols can be transformed by otl. [Must the bracketing items be on the same line?] An example with the default HTML parameter file:

###
	Joe Bob went to a home where he found things __underlined__ and in --bold--.
###
	would be replaced by

###	
	Joe Bob went to a home where he found things <span style="text-decoration: underline;">underlined</span> and in <span style="font-weight: bold;">bold</span>.
###	
	
Bracketed text is specified in the parameter file on a single line beginning with the word 'bracket':

	bracket	bracket-text	|start replacement|	|end replacement|

Example parameter file line(s):

###
	bracket	//	|<pre>|		|</pre>|
	bracket	__	|<b>|		|</b>
###

notes:

- use specialized otl characters and words with care:

		- | is used to define start and end of item-to-replace with at start
		- | is used to define start and end of item-to-replace with at end
		- no spaces are allowed in item-to-look-for - item-to-look-for can't be "start" unless formatted as a regex
		- | will throw things off if used indiscriminately anywhere else in the parameter file except as above

	- use perl regex characters with care:
		[ or ], { or }, ...


start
^^^^^^^^^^^^^^^^^

A line which begins with a specified 'start text' can be transformed by otl. Typically, these are used to indicate items which are members of a list. For example, with the default HTML output, the numeric digit followed by a period character represents a 'start' pattern that triggers processing a list items.

	Joe is
	1. big
	2. fat
	3. tiny
	
	is converted to

###	
	Joe is
	<ol>
		<li>big</li>
		<li>fat</li>
		<li>tiny</li>
	</ol>
###


A "start" item is specified in the parameter file by a single line with the syntax:

	start	trigger-text	|start replacement|	|end replacement|	|tag at start of list| |tag at end of list| list-type

An example parameter file line where trigger-text is "-":

###
	start -	|<li>|	|</li>|	|<ol>| |</ol>|	o
###


trigger-text
^^^^^^^^^^^^^^^^^^

The item which is searched for. Presence of this item at the start of a line (1st non-whitespace item) triggers substitution. Whitespace is not permitted as a component of the trigger-text string.
	
|start replacement|
	The character string which replaces trigger-text
	
|end replacement|
	The character string added to the end of line which began with trigger-text
	
|tag at start of list|
	If the item is a list, the string to add before the first list item
	
|tag at end of list|
	If the item is a list, the string to add after the last list item

list-type	
	o (ordered)
	n (not a list item)
		note: |tag-at-start-of-list| and |tag-at-end-of-list| are not used for type n items
	u (unordered)  


notes:

- termination of a list: how does otl decide when to terminate a list?
	possibilities:
		- an empty line? two empty lines? 
			- for readability of plain text, it's nice to be able to have an empty line
			- for simplicity of code, it's easiest to use an empty line as the flag to terminate a list
		- the next list item is out of sequence? (not useful for an unordered list since we can't follow numeric or alphabetic sequence)
		
	current implementation:
		- a list is terminated when an empty or non-triggered line is encountered

- TABS are used to indicate whether we are moving inwards or outwards with respect to nesting - e.g.,

		1. joe bob is a big boy
			- item 2
				1. joe is 2

			versus

		1. joe bob is a big boy
			-item 2
		2. joe bob is a little boy

- if the tab number doesn't match the nesting level, then otl will add a set of tags around the item using the "trigger-tab" directive in the parameter file. For example, the first list below is an unnested list. The second example also contains an unnested list, but it is indented in the original source.

1. Joe is a big boy
2. Joe is a big boy
3. Joe is a big boy

versus

The list below can be processed differently. In this case, the list is also not nested but it is indented. When the nesting level and the tab level do not match, otl will [optionally] use the tabbing information to generate different markup (see trigger-tab below).

	1. Joe is a big boy
	2. Joe is a big boy
	3. Joe is a big boy
	
If the parameter file contains a line beginning with the string "trigger-tab" in the format indicated below, then "start-string" and "end-string" will be added before and after the start and end tags for the list, respectively. If multiple trigger-tab lines are listed, the number of tabs will indicate which trigger-tab line is employed.

Format:

trigger-tab |start-string| |end-string|


Example:

###
trigger-tab |<div style="margin-left: 40 em;">| |</div>|
###

under
^^^^^^^^^^^^

A line succeeded by a line containing a specific text pattern can also trigger processing by otl. This type of processing is specified by a line beginning with the string "under" in the parameter file. For example, by default,

	the main heading
	====

would be replaced by

###
	<h1>the main heading</h1>
###
	
The parameter file line defining this:

###
	under	====	|<h1>| |</h1>|
###


blank lines
^^^^^^^^^^^^^^^

otl will treat blank lines in a specified manner if this is defined in the parameter file. An example line in the parameter file which would cause blank lines to be converted into "empty paragraphs":

###
	blank |<p style="margin: 0em; padding: 0em;"><br /></p>|
###


a "substitution" table can be specified in the parameter file 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

note: this currently not implemented in otl.pl; a separate script otlsub does this

The parameter file may contain a set of lines with the following format:

###
##SUB##
a b
c d
e f
g h
##ENDSUB##
###

Details:
	##SUB## - indicates start of substitution table
	##ENDSUB## - indicates end of substitution table
	a,c,e,... - a perl reg expression which will be searched for
	b,d,f,... - a perl reg expression which will be substituted for a,c,e,... (respectively)

	
a "default" format is specified in the parameter file
^^^^^^^^^^^^^^^^^^^^^^^^^^^

The 'default' format is specified with a one or more lines in the parameter file, each of which has the format:

|starttag| |endtag|


example:
###
	|<p>|	|</p>|
###

example:
###
	|<p style="margin-left: 2em;">| |</p>|
###

Such parameter file lines specify the default(s) for a line which does not have tags in front of it. Otl processes the source file one line at a time; if a line is not categorized in the "list" or "under" or "ignore" categories, otl will apply this default format to the line if such a default is specified in the parameter file

notes:

- tabs in front of lines which are in the default format will indicate that otl should use the subsequent "default" line in the parameter file for formatting


additional information the parameter file provides
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Apart from the syntaxes associated with the substitutions described above, several other items can be defined in the parameter file:

I. ITEMS TO SIMPLY ADD TO THE OUTPUT FILE at the beginning or the end of the file

- these items must be placed at the top of the parameter file

head - puts stuff at top of file

	example:

###	
	head
	<html>
	<head>
	<title>joebob's paper</title>
	</head>
	<body>
	head
###

foot - puts stuff at bottom of file

	example:

###	
	foot
	</body>
	</html>
	foot
###

the default rule set
--------------------

- see the end of otl.pl


examples
--------

additional examples of otl applications are described in the examples directory


issues/problems
---------------

1. otl seems to be trying to insert a table into a "random" location in the file
	- is there a set of two or more spaces succeeded and preceded by alphanumeric text at some point nearby? If so, it's probably triggering table generation. Turn table generation off with --no-tables [not yet implemented] or delete the extra space.
