Indexes
SADE ships with preconfigured indexes to be found in the files collection.xconf
. For larger data sets a good index configuration helps to provide adequate load timings.
Attention: To take effect changes must be done in the corresponding resources at /db/system/config/
. The the post installation script copies the files over there. This is just to preserve the configuration in the EXPath package of this application (and the git repo).
Range Index
The range index is used for fast key/value lookups. You will find more information in the eXist-db documentation. Queries are may optimized by automatic index look-ups when they provide paths.
Lucene Fulltext Index
For all text nodes in the XML resources Lucene provides an index to query for words and phrases. It is used by the faceted search module. With a standard installation of SADE, a customized and configurable String Analyzer is available. It is installed as a XAR library. For more information on the built-in Lucene, please look at the eXist manual.
Current Configurations
/collection.xconf |
Within the root collection, the configuration contains the RESTXQ trigger only, that enables REST annotations for function calls to provide paths and convert folders to parameter. This currently not used by SADE. All paths additional to the collections and resources stored in the database are provided by the URL rewriter at /controller.xql . |
textgrid/agg/collection.xconf and textgrid/rdf/collection.xconf |
Enables the RDF index. This is an optional feature that requires the RDF index module from a separate package. |
textgrid/tile/collection.xconf |
This configures the range index for objects generated with TextGrid's text-image-link editor. Indexes are configured for link targets and shapes. |
textgrid/meta/collection.xconf |
Within this collection all TextGrid metadata is stored. Usually queries are on the complete URI in the tgmd:textgridUri element and the title from tgmd:title . |
textgrid/data/collection.xconf |
This configures the Lucene index and a range index. |
Lucene
Character mappings and a synonym list are stored in separate files on the file system. The publish GUI puts text files (text/plain) there. This means that you can maintain these files from outside the database, via the Lab or any other external tool.
Charmaps
When an object named charmap.txt
comes in via the Publisher, this file will be stored on the system at $EXIST_HOME
. It has to comply to the Lucene standard.
Synonyms
Same applies to a synonyms.txt
.
TEI data
To define the nodes a text index should be prepared with <text qname="tei:TEI" analyzer="fontane"/>
. To index words that are divided in different text nodes because of inline elements, these elements should be defined with <inline qname="tei:g"/>
(optional). Text nodes in elements may not appear in the index can be defined with <ignore qname="tei:del"/>
(optional).
Range
The entries here just serve as an example. They are used within the Fontane-Notizbücher project but should usually not interfere with custom data.