.. image:: http://sflogo.sourceforge.net/sflogo.php?group_id=134329&type=7
    :height: 62
    :width: 210
    :alt: SourceForge.net Logo
    :target: http://sourceforge.net

OOoPy: Modify OpenOffice.org documents in Python
================================================

OpenOffice.org (OOo) documents are ZIP archives containing several XML
files.  Therefore it is easy to inspect, create, or modify OOo
documents. OOoPy is a library in Python for these tasks with OOo
documents. To not reinvent the wheel, OOoPy uses an existing XML
library, ElementTree_ by Fredrik Lundh. OOoPy is a thin wrapper around
ElementTree_ using Python's ZipFile to read and write OOo documents.

.. _ElementTree: http://effbot.org/zone/element-index.htm

In addition to being a wrapper for ElementTree_, OOoPy contains a
framework for applying XML transforms to OOo documents. Several
Transforms for OOo documents exist, e.g., for changing OOo fields (OOo
Insert-Fields menu) or using OOo fields for a mail merge application.
Some other transformations for modifying OOo settings and meta
information are also given as examples.

Applications like this come in handy in applications where calling
native OOo is not an option, e.g., in server-side Web applications.

Don't be alarmed by the Alpha-Status of the Software: Reading and
writing of OOo documents is stable as well as most transforms.

The only problematic transform is mailmerge: The OOo format is well
documented but there are ordering constraints in the body of an OOo
document. I've not yet figured out all the tags and their order in the
OOo body.  Another known shortcoming of OOoPys mailmerge is the
renumbering of body parts of an OOo document. Individual parts (like
e.g., frames, sections, tables) need to have their own unique names.
After a mailmerge, there are duplicate names for some items. So far I'm
renumbering only frames, sections, and tables. See the renumber objects
at the end of ooopy/Transforms.py. So if you encounter missing parts of
the mailmerged document, check if there are some renumberings missing or
send me a `bug report`_.

Another reason for the Alpha-Status is stability of the API. I may
still change the API slightly.
There were some slight changes to the API when supporting the open
document format introduced with OOo 2.0.

.. _`bug report`: http://ooopy.sourceforge.net/#reporting-bugs

There is currently not much documentation except for a python doctest in
OOoPy.py and Transformer.py and the command-line utilities_.
For running these test, after installing
ooopy (assuming here you installed using python2.4 into /usr/local)::

 cd /usr/local/share/ooopy
 python2.4 run_doctest.py /usr/local/lib/python2.4/site-packages/ooopy/Transformer.py
 python2.4 run_doctest.py /usr/local/lib/python2.4/site-packages/ooopy/OOoPy.py

Both should report no failed tests.
For running the doctest on python2.3 with the metaclass trickery of
autosuper, see the file run_doctest.py. For later versions of python the
bug in doctest is already fixed.

Usage
-----

See the online documentation, e.g.::

 % python
 >>> from ooopy.OOoPy import OOoPy
 >>> help (OOoPy)
 >>> from ooopy.Transformer import Transformer
 >>> help (Transformer)

Help, I'm getting an AssertionError traceback from Transformer, e.g.::

 Traceback (most recent call last):
   File "./replace.py", line 17, in ?
     t = Transformer(Field_Replace(replace = replace_dictionary))
   File "/usr/local/lib/python2.4/site-packages/ooopy/Transformer.py", line 1226, in __init__
     assert (mimetype in mimetypes)
 AssertionError

The API changed slightly when implementing handling of different
versions of OOo files. Now the first parameter you pass to the
Transformer constructor is the mimetype of the OpenOffice.org document
you intend to transform. The mimetype can be fetched from another opened
OOo document, e.g.::

  ooo = OOoPy (infile = 'test.odt', outfile = 'test_out.odt')
  t = Transformer(ooo.mimetype, ...

A, well, there are command-line _`utilities` now:

- ooo_cat for concatenating several OOo files into one
- ooo_fieldreplace for replacing fields in an OOo document
- ooo_mailmerge for doing a mailmerge from a template OOo document and a
  CSV (comma separated values) input
- ooo_as_text for getting the text from an OOo-File (e.g., for doing a
  "grep" on the output).

Resources
---------

Project information and download from `Sourceforge main page`_

.. _`Sourceforge main page`: http://sourceforge.net/projects/ooopy/

You need at least version 2.3 of python.

For using OOoPy with Python versions below 2.5, you need to download and
install the
`ElementTree Library`_ by Fredrik Lundh. For documentation about the OOo
XML file format, see the book by J. David Eisenberg called
`OASIS OpenDocument Essentials`_ which is under the Gnu Free
Documentation License and will probably be printed soon. For a reference
document you may want to check out the `XML File Format Specification`_
(PDF) by OpenOffice.org.

A german page for OOoPy exists at `runtux.com`_

.. _`ElementTree Library`: http://effbot.org/downloads/#elementtree
.. _`OASIS OpenDocument Essentials`: http://books.evc-cit.info/
.. _`XML File Format Specification`: http://xml.openoffice.org/xml_specification.pdf
.. _`runtux.com`: http://www.runtux.com/ooopy.html

Reporting Bugs
--------------
Please use the `Sourceforge Bug Tracker`_ and

 - attach the OOo document that reproduces your problem
 - give a short description of what you think is the correct behaviour
 - give a description of the observed behaviour
 - tell me exactly what you did.

.. _`Sourceforge Bug Tracker`:
    http://sourceforge.net/tracker/?group_id=134329&atid=729727

Changes
-------

Version 1.4: Minor bugfixes

Fix Doctest to hopefully run on windows. Thanks to Dani Budinova for
testing thoroughly under windows.

 - Open output-files in "wb" mode instead of "w" in doctest to not
   create corrupt OOo documents on windows.
 - Use double quotes for arguments when calling system, single quotes
   don't seem to work on windows.
 - Dont use redirection when calling system, use -i option for input
   file instead. Redirection seems to be a problem on windows.
 - Explicitly call the python-interpreter, running a script directly is
   not supported on windows.

Version 1.3: Minor bugfixes

Regression-test failed because some files were not distributed.
Fixes SF Bugs 1970389 and 1972900.

 - Fix MANIFEST.in to include all files needed for regression test
   (doctest).

Version 1.2: Major feature enhancements

Add ooo_fieldreplace, ooo_cat, ooo_mailmerge command-line utilities. Fix
ooo_as_text to allow specification of output-file. Note that handling of
non-seekable input/output (pipes) for command-line utils will work only
starting with python2.5. Minor bug-fix when concatenating documents. 

 - Fix _divide (used for dividing body into parts that must keep
   sequence). If one of the sections was empty, body parts would change
   sequence.
 - Fix handling of cases where we don't have a paragraph (only list) elements
 - Implement ooo_cat
 - Fix ooo_as_text to include more command-line handling
 - Fix reading/writing stdin/stdout for command-line utilities, this
   will work reliably (reading/writing non-seekable input/output like,
   e.g., pipes) only with python2.5
 - implement ooo_fieldreplace and ooo_mailmerge

Version 1.1: Minor bugfixes

Small Documentation changes

 - Fix css stylesheet
 - Link to SF logo for Homepage
 - Link to other information updated
 - Version numbers in documentation fixed
 - Add some checks for new API -- first parameter of Transformer is checked now
 - Ship files needed for running the doctest and explain how to run it
 - Usage section

Version 1.0: Major feature enhancements

Now works with version 2.X of OpenOffice.org. Minor API changes.

 - Tested with python 2.3, 2.4, 2.5
 - OOoPy now works for OOo version 1.X and version 2.X
 - New attribute mimetype of OOoPy -- this is automatically set when
   reading a document, and should be set when writing one.
 - renumber_all, get_meta, set_meta are now factory functions that take
   the mimetype of the open office document as a parameter.
 - Since renumber_all is now a function it will (correctly) restart
   numbering for each new Attribute_Access instance it returns.
 - Built-in elementtree support from python2.5 is used if available
 - Fix bug in optimisation of original document for concatenation
