Proposal of document web publishing estructure

Ismael Olea

Revision History
Revision 0.12004/02/09
First version of the proposal

Table of Contents

Structured publishing:
General notes:
Unstructured publishing:
Example:

Abstract

A proposal for structured web publishing of documentation.

Due to the lack of assistant tool for make really easy the process of structured publishing, there are two class of libraries: the structured managed one and the unstructured one. Will call the fist a «library» and the second «drafts library». We use «draft» here because if the document is published here, it means it has not been under the quality assurance process, so it is considered a draft.

Each document will be in one of the two main classes, all hanging at the same level. This mean, there is no classification hierarchy of directories, so all documents share the same namespace in its own hierarchy.

Each document will have its own directory hierarchy. The name of its main hierarchy will be the code name in the CMS management process.

Each document will be accesible by a URL. Indeed, this means all the files and directories hanging from the main document directory will be too.

Structured publishing:

This is the complexest of both. The idea is to create this structure automatically using publishing tools like «imprenta-e». Having to use this way at hand could be a nightmare and very prone to errors.

Each document has a own hierarchy named with a codename in the same style. Obviously different documents can't share the same name.

Hanging of the main dir, will be a directory by different versions. The name of each directory will be the publishing date in ISO format.

Obviously, if there is only one version, it will be a only publishing date directory.

Hanging from the same directory will be an index.html link to the most recent version of the document. It can be done with an HTTP redirection or with a filesystem link. (Note: this may change in the future as experience could prefer one of the both methods.)

On each publishing date directory will hang the «reproduction formats». On the web, usually are HTML, PDF and sources. (Note: it could be interesting consider how to publish XML files directly substituting or adding to the HTML ones.) There'll be an index.html file redirecting (using a similar technique as the previous one) to the HTML version.

The HTML version will hang on a directory named «codename-publishing_date» and will contain the HTML files, the document's illustrations and a index.html.

The PDF version will be named «codename-publishing_date» plus the .pdf extension.

The sources share the same codenaming in a archiving file as tar.gz or zip. (Note: don't sure about restrictions on this formats, probably they will be restricted by the imprenta-e capabilities.)

General notes:

It could be very interesting to use the first publishing version directory for hosting the HTML files avoiding the use of a new subdirectory. This will offer cleaner URL's. It should be studied on a working prototype.

Using index.html files hides other reproduction formats than HTML. For solving this, the real index.html of each version (this is, at the HTML format and not those that works as links) should be created using a style sheet which will add an small header to the document with links to the other versions and, maybe, to other document related info as the translation status, the original version (if this is a translation) or to the Q&A status.

The idea of having different publishing date directories is for not creating dead links for the future. Also it works as an historical of the document life.

Unstructured publishing:

This is so simpler. Because we are not sure the document has being wrote and composed following our publishing recommendations, we can't be sure they can be automatically managed with our tools.

In the same way, each document has a own hierarchy named with a codename in the same style. Obviously different documents can't share the same name.

Hanging of the main dir, will be a directory by different versions. The name of the directory will be the publishing date in ISO format. Hanging from the publishing date directory are all the files uploaded.

Obviously, if there is only one version, it will be a only one publishing date directory.

There is no index.html files or links.

The idea behind this is, if we can't do structured publishing for this document, at least, put it on the web in a clear way accessible to web crawlers and spiders, for being accessed from web search tools like Google, Altavista, etc.

Example:

This is an example of how can look a publishing tree following this recommendation:

    library/
        doc-como-hacer_la_o/
            20040102/
            index.html->20040102/index.html
                doc-como-hacer_la_o-20040102.pdf
                doc-como-hacer_la_o-20040102.tgz
                doc-como-hacer_la_o-20040102/
                    index.html
                    file1.html
                    file2.html
                index.html->doc-como-hacer_la_o/index.html
            20031202/
                doc-como-hacer_la_o-20031202.pdf
                doc-como-hacer_la_o-20031202.zip
                doc-como-hacer_la_o-20031202/
                    index.html
                    file1.html
                    file2.html
                index.html->doc-como-hacer_la_o/index.html
    drafts_library/
        doc-documento-guarreras/
            20040301/
                file
                file2
            20031231/
                file1
                file2
        doc-sucio-documento/
            20040202/
                file1