Contents
Server Status (raw)
The following status information describes the underyling DDC server process.key | : value | description |
---|---|---|
name | : server:gei_digital | symbolic name for the underlying DDC server process (syslog label) |
version | : 2.2.8 | DDC library version for the underlying DDC server process |
compat | : 2.2.8 | server compatibility mode |
started | : 2024-02-20 14:58:36+0100 | time of last server re-start |
uptime | : 38 days, 1 hour, 7 minutes, and 51 seconds | time since last server re-start |
nrequests | : 1.57 K | total number of client requests processed by the underlying DDC server |
nqueries | : 1.47 K | total number of query requests processed by the underlying DDC server |
nerrors | : 73 | total number of failed client requests (e.g. due to parse errors) |
nslow | : 1 | total number of slow client requests and logged as such |
qtavg | : 219 ms | running average query processing time |
nworkers | : 8 | number of concurrent client worker thread(s) |
mem | : 6.78 GB | memory resident set size used by the underlying DDC server (total non-swapped physical memory used) |
navcachesize | : 0 | current size of internal navigation hint cache |
corpora | : 4 logical ~ 4 physical | number of sub-corpora contributing to this corpus |
mmap | : 4 / 4 | number of memory-mapped physical sub-corpora |
hitstrings | : serial | evaluation mode for hit-string retrieval (serial or parallel) |
Index Information (raw)
Collection
The physical subcorpora supplied the following collection information conforming to D* build system conventions.collection | description |
---|---|
gei_digital | GEI-Digital (ca. 1700-1920), Georg-Eckert-Institut - Leibniz-Institut für internationale Schulbuchforschung |
Basic Information
The following basic information was estimated by aggregation over all physical leaf nodes of the current corpus.key | : value | description |
---|---|---|
indexed | : 2021-01-29 21:17:57+0100 | timestamp of youngest physical sub-corpus (*._con mtime) |
nfiles | : 5.04 K | total number of indexed files |
nsources | : 0 | total number of source files during index compilation |
nmasked | : 0 | total number of masked corpus files (indexed but not displayed) |
ntokens | : 544 M | total number of indexed tokens |
Version Information
The physical subcorpora supplied the following version information conforming to D* build system conventions.key | : value(s) | description |
---|---|---|
build-cab | : de-dta-2020-12-05 | version identifier(s) for D* CAB resources used during corpus build |
build-index | : gei_digital-ddc-index-2021-01-29 | version identifier(s) for most recent D* DDC index update |
build-index.orig | : gei_digital-ddc-index-2020-12-17 | version identifier(s) for initial D* DDC index build |
build-src | : gei_digital-src-xml-2020-12-16 | version identifier(s) for D* TEI-XML corpus sources |
build-timestamp | : 2021-01-31T21:48:21+0100 | timestamp(s) of D* DDC index compilation |
curator | : scheel@gei.de | corpus content curator(s) |
maintainer | : jurish@bbaw.de | corpus infrastructure maintainer(s) |
Token Attributes
The following token attribute indices (s_index) are available for use in single-token queries, contextual sort operators and count-key expressions.name | alias(es) | visible | size | description |
---|---|---|---|---|
$Token | $w $Utf8 $u |
yes | 10.2 M | token surface text |
$CanonicalToken | $v | yes | 9.34 M | DTA::CAB-normalized modern equivalent wordform |
$Pos | $p | yes | 255 | part-of-speech tag for the source token (typically STTS) |
$Lemma | $l | yes | 8.53 M | lemma for the source token as returned by CAB+moot+TAGH; typically the default attribute for bareword queries |
$Page | $page | yes | 4.84 K | page (scan) identifier, not to be confused with DDC-internal page_ counter |
$WordSep | $ws | yes | 8 | Boolean attribute, '1' (one) if the source token text is immediately preceded by whitespace, otherwise '0' (zero) |
Bibliographic Metadata Attributes
The following bibliographic metadata attributes and aliases (s_field) are available for use in metadata filters, metadata sort operators, and count-key expressions.name | visible | portability | size | description |
---|---|---|---|---|
orig | no | ★★★★★ | 0 | bibliographic decsriptor for original source document (not properly searchable) |
scan | no | ★★★★★ | 0 | bibliographic descriptor for scanned source document (not properly searchable) |
date | no | ★★★★★ | 0 | source document date, reported as YYYY[-MM[-DD]] |
page | no | ★★★★★ | 0 | page counter integer offset |
author | yes | ★★★★ | 2.16 K | document author(s) |
avail | yes | ★★★★ | 4 | constant: availability code matching regex [MO]R[0-9][WS] |
availability | yes | ★★★ | 4 | constant: human-readable licensing conditions |
basename | yes | ★★★★ | 5.04 K | basename of indexed source file (document identifier) |
bibl | yes | ★★★★ | 5.03 K | human-readable (short) bibliographic reference string for display |
bildungslevel | yes | 44 | --undocumented-- | |
collection | yes | ★★★★ | 4 | constant: symbolic label for the (sub)corpus collection |
dokumenttyp | yes | 34 | --undocumented-- | |
editor | yes | 426 | --undocumented-- | |
empty | no | ★★ | 4 | constant: empty string constant |
flags | yes | ★★★★ | 4 | constant: colon- or space-separated list of Boolean flags |
geiclass | yes | 37 | --undocumented-- | |
geicode | yes | 299 | --undocumented-- | |
land | yes | 9 | --undocumented-- | |
person | yes | 326 | --undocumented-- | |
place | yes | 609 | --undocumented-- | |
ppn | yes | 5.04 K | --undocumented-- | |
publisher | yes | 1.2 K | --undocumented-- | |
schulform | yes | 36 | --undocumented-- | |
textClass | yes | ★★★★ | 4 | constant: colon-separated list of symbolic document genre(s), primary genre first |
timestamp | yes | ★★★★ | 30 | source document timestamp in UTC ISO-8601 format {YYYY}-{MM}-{DD}T{HH}:{MM}:{SS}Z |
title | yes | ★★★★ | 4.98 K | document title |
unterrichtsfach | yes | 32 | --undocumented-- | |
url | yes | ★★★ | 5.04 K | source URL for this document, if available |
zeitspanne | yes | 12 | --undocumented-- | |
corpus | no | ★ | 4 | constant: deprecated, prefer flags |
dtadir | no | ★ | 5.04 K | deprecated alias for basename |
textClassDWDS | no | ★ | 4 | constant: deprecated alias for textClass |
Break Collections
The following break collections (s_break) are available for use in anchor queries and the #WITHIN query operator.name | alias | size | description |
---|---|---|---|
sentence | s | 25.7 M | single sentence or sentence-like unit |
paragraph | p | 1.27 M | single paragraph or paragraph-like unit |
file | 5.04 K | single input document (e.g. article, volume) |
Operator-Dependent Defaults
Query operators lacking an explicit index specification (s_index rsp. s_break) will select a default token attribute or break collection depending on the selected query operator as follows:operator | attribute(s) | description |
---|---|---|
_ | $Lemma | default attribute for bareword search terms (qw_bareword, qw_set_infl) |
@_ | $Token | default attribute for exact match search terms (qw_exact, qw_set_exact) |
/_/ | $Token | default attribute for regular expression search terms (qw_regex, qw_prefix, qw_suffix, qw_infix, qw_prefix_set, qw_suffix_set, qw_infix_set) |
%_ | $Lemma | default attribute for lemma search terms (qw_lemma) |
. | sentence | default break collection for anchor queries (qw_anchor) |
Term Expanders
The following term expanders (s_expander) are available for use in term expansion pipelines for bareword and set-valued term queries.name | description |
---|---|
CanonicalToken | default expansion chain for the $CanonicalToken attribute, typically an alias for eqlemma |
Lemma | default expansion chain for the $Lemma attribute, typically an alias for lemma |
Lemmas | typically an alias for lemmata |
Lemmata | typically an alias for lemmata |
Pos | default expansion chain for the $Pos attribute, typically an alias for case |
Token | default expansion chain for the $Token attribute, typically an alias for eqlemma |
Utf8 | default expansion chain for the $Utf8 attribute, typically an alias for eqlemma |
WebLemma | synchronic lemmatizer, typically an alias for lemma |
cab | union of eqlemma, pho, and rw expanders |
case | upper-/lower-case variant expander (modulo "McKinsey" et al.) |
eql | typically an alias for eqlemma |
eqlemma | finds all lemma-equivalent surface forms using an external CAB server with auxilliary vocabulary database |
germanet | alias for gn-asi |
gn-asi | maps lemmata to all GermaNet hyponyms of any depth (synonyms or subclasses) |
gn-asi1 | maps lemmata to all depth≤1 GermaNet hyponyms (synonyms or immediate subclasses) |
gn-asi2 | maps lemmata to all depth≤2 GermaNet hyponyms (synonyms, first-, or second-order subclasses) |
gn-isa | maps lemmata to all GermaNet hyperonyms of any depth (synonyms or superclasses) |
gn-isa1 | maps lemmata to all depth≤1 GermaNet hyperonyms (synonyms or immediate superclasses) |
gn-isa2 | maps lemmata to all depth≤2 GermaNet hyperonyms (synonyms, first-, or second-order superclasses) |
gn-sub | alias for gn-asi |
gn-sub1 | alias for gn-asi1 |
gn-sub2 | alias for gn-asi2 |
gn-sup | alias for gn-isa |
gn-sup1 | alias for gn-isa1 |
gn-sup2 | alias for gn-isa2 |
gn-syn | maps lemmata to immediate GermaNet synonyms (synset co-membership) |
gn-syn1 | maps lemmata to depth≤1 GermaNet synonyms (synset co-membership or immediate hyponyms/hyperonyms) |
gn-syn2 | maps lemmata to depth≤2 GermaNet synonyms (synset co-membership, first-, or second-order hyponyms/hyperonyms) |
id | identity expander (no-op) |
infl | alias for morphy |
lc | typically an alias for tolower |
lemma | default lemmatizer using external CAB server (precision-oriented, "best" lemma only, typically the default) |
lemmas | typically an alias for lemmata |
lemmata | alternative lemmatizer using external CAB server (recall-oriented, returns all known lemata) |
lemmatize | typically an alias for lemma |
morphy | built-in DDC morphy lemmatization & re-inflection (not recommended, prefer Lemma or eqlemma) |
null | identity expander (no-op) |
openthes | alias for ot-asi |
openthesaurus | alias for ot-asi |
ot-asi | maps lemmata to all OpenThesaurus hyponyms of any depth (synonyms or subclasses) |
ot-asi1 | maps lemmata to all depth≤1 OpenThesaurus hyponyms (synonyms or immediate subclasses) |
ot-asi2 | maps lemmata to all depth≤2 OpenThesaurus hyponyms (synonyms, first-, or second-order subclasses) |
ot-isa | maps lemmata to all OpenThesaurus hyperonyms of any depth (synonyms or superclasses) |
ot-isa1 | maps lemmata to all depth≤1 OpenThesaurus hyperonyms (synonyms or immediate superclasses) |
ot-isa2 | maps lemmata to all depth≤2 OpenThesaurus hyperonyms (synonyms, first-, or second-order superclasses) |
ot-sub | alias for ot-asi |
ot-sub1 | alias for ot-asi1 |
ot-sub2 | alias for ot-asi2 |
ot-sup | alias for ot-isa |
ot-sup1 | alias for ot-isa1 |
ot-sup2 | alias for ot-isa2 |
ot-syn | maps lemmata to immediate OpenThesaurus synonyms (synset co-membership) |
ot-syn1 | maps lemmata to depth≤1 OpenThesaurus synonyms (synset co-membership or first-order hyponyms/hyperonyms) |
ot-syn2 | maps lemmata to depth≤2 OpenThesaurus synonyms (synset co-membership, first-, or second-order hyponyms/hyperonyms) |
pho | phonetic equivalent surface forms using an external CAB server with auxilliary vocabulary database |
pos-ud | maps UD Pos-tags to the native corpus tagset: typically an alias for pos-ud2stts, used by CLARIN-FCS |
pos-ud2stts | maps UD PoS-tags to STTS |
rw | rewrite-equivalent surface forms using an external CAB server with auxilliary vocabulary database |
sem | alias for semsim |
sem10 | alias for LEMMA@10|semsim |
sem100 | alias for LEMMA@100|semsim |
sem50 | alias for LEMMA@50|semsim |
semsim | DTA::SemCloud distributional semantic index k-nearest neighbor lemmata, query as (LEMMA@K|semsim) |
tolower | maps input term(s) to all lower-case |
toupper | maps input term(s) to all upper-case |
uc | typically an alias for toupper |
web | typically an alias for WebLemma |
www | typically an alias for WebLemma |