How Anastasia works: the processing model
This is what happens when the reader asks Anastasia to process some XML.
A call is sent from the browser to the Anastasia server. The call has the form "http://www.myserver /AnaServer?MyBook+0+start.anv". The call is received by the host Apache server "www.myserver" (which in CD-ROM mode will be running on the local machine):
- The Apache server recognizes that the AnaServer prefix requires that the request should be passed to the Anastasia module, loaded into Apache
- Anastasia then receives the request MyBook+0+start.anv
- Anastasia checks that it knows about the book "MyBook". This must be an Anastasia book compiled from the XML using the Anastasia GroveMaker tool. This is designed to permit fast, real-time access to any part of the underlying XML
- Anastasia interprets the "0+start.anv" as saying: go to element 0 in the book and start processing it, using the instructions in the file "start.anv"
Processing one page of text
What happens next depends on the starting element and what is found in the 'anv' file. Here is a fairly typical scenario, for any page of a manuscript transcription presented in the Miller's Tale on CD-ROM (see further the SDE website):
- The starting point is the beginning of a page, identified by the <pb/> element corresponding to the Anastasia identifier number in the call (by definition, "0" is the whole book -- in TEI terms, the root <TEI.2> element
- First, we have to write to the browser various starting materials, to set up the display to follow. We do this through a "begin" process defined in the anv file. This is used to write out the starting <HTML> code, and any javascript and metadata appropriate. Then, we have to write out various hypertext links: particularly, we have to test whether there is a next and previous page and put in links to them. We also want to tell the reader what text is on the page. To do this, we need to traverse to the first line and last lines on the page and calculate what how they should be identified. This can be tricky! There might be two different texts on the page, or more: we need to identify the ranges of all these and connect them into a sensible string
- Then, Anastasia starts reading the book, beginning at this nominated <pb/> element. What it does is generate an event for every XML element, and for every fragment of text data contained within the elements. Actually, for content elements it generates three events: one when it meets the start of the element, one when it meets the content, one when it meets the end of the element. For each event, Anastasia looks into the .anv file and asks: is there a procedure defined for this event?
- Thus: when Anastasia meets a <hi> element, it will call the procedure named "hi", with one parameter set to "before" and another to the Anastasia identifier of the element. Inside the "hi" procedure is some code which says: if I am called with the parameter "before", and there is a "rend" attribute to this element set to "ital", then write <I> out to the browser. Anastasia sends another call to the procedure when it reads the content of the element (with the parameter set to "content"): you could use this to hide the element content. Finally, a call is sent when Anastasia meets the closing </hi>: this time, the "hi" procedure writes </I> out to the browser, and we are done.
- A page of transcription of a Miller's Tale manuscript is organized into <l> elements, containing the words themselves within <w> elements. Each time Anastasia starts reading a <l> element a process within the .anv file does all this:
- Starts a new row in the html output
- Checks if the line number (held in a "n" attribute on the <l> element) is divisible by 5. If it is, then it writes the line number into the html, inside a hypertext link which has Anastasia call a function to calculate what this line number is in the three major Canterbury Tales numbering systems
- Checks if there is an ornamental capital being written over these rows: if there is, it skips writing a cell in the table to accommodate this
- At the end of the <l> element, checks if there is an entry in the stemmatic commentary on any reading in this line: if there is, a hypertext link is written to this
- Also at the end of the <l> element, Anastasia checks if the <l> contained any <note> elements. If it did, Anastasia writes a hypertext link to them in the browser before closing the row containing the line
- The main processing is done on each <w> element within the line. Elsewhere in the XML there is a massive textual apparatus. This declares, for every word or phrase in every manuscript of the 730 or so lines of the Miller's Tale and accompanying link, just what the variant readings are for that word in every other manuscript. So for each <w> element, a "w" procedure:
- Looks up the apparatus in the XML to find what information it has about what variants it has on this word
- Depending on the choices the user has made: either the "w" procedure generates a hypertext link to a screen giving information about the variants at this point; or it abstracts this information, in summary form, from the apparatus and puts it in a pop-up appearing over the word when the mouse moves over this link
- If the variant place extends over several words, the procedure sets a variable flag
- At the end of each </w>, the procedure checks this flag, and postpones closing this link till all the words in this place of variation have been written.
- So Anastasia carries on reading and processing the lines and the words in them until either it meets the start of the next page (signalled by another <pb/> element) or the end of all the text in this manuscript. When it does, the appropriate procedure in the .anv file generates a "finish" event. This stops Anastasia reading the XML -- in mid-word, perhaps!
- Anastasia then calls an "end" procedure in the .anv file. This closes out the html being written to the browser, sends the rest of what Anastasia has written to the browser, and then closes the connection to the browser.
Many other things can happen in the course of reading the page. Because it is a medieval text, there will be many non-standard characters. Each time Anastasia meets one it generates an SDATA event, and checks if there is a SDATA procedure for that character to define how it should be written to the browser. Or, the page may be called following a search, requiring that some of the text be highlighted. Anastasia will generate events for the start and end of each hit, and again look in the .anv file for a "found" procedure defining how the hits should be presented.
How is this all managed?
The keys to all this, from the developer's perspective, are the .anv files for the electronic book. These are written in Tcl ("Tool control language"): a very widely supported scripting language with implementations on all major computer platforms. Tcl is easy to learn and easy to use (some would say, too easy). In Tcl you can set up loops, if/then/else and switch/case statements, do string searches and replacements, have global and local variables, look up values in other files and include other files, and much more. Anastasia extends the standard Tcl environment by providing some thirty extension commands, which allow you to look up any element within the XML, examine its properties, navigate around it, and search for it or for any text, all from within any procedure within the .anv file. A typical Anastasia statement in an .anv file looks like so:
if {[attr $me rend=="ital"} {set text "<I>"}
This checks the value of the "rend" attribute on the current element being processed ($me): if it is "ital", then "<I>" is written out to the browser.
By design, Anastasia does all this in real time, in direct response to a user request. Therefore, it has to be able to get what it wants from the XML very fast, no matter how large the XML or where it is in the XML. It does this by interrogating not the source XML but a set of binary files (an "AnaGrove") which are optimized to permit the fastest access to information about every aspect of the source XML.
In summary...
You could characterize Anastasia as an event-driven procedural environment for handling XML document collections. As such, it is very different from such tools as XSLT. See also "How Anastasia differs from XSLT".