D*/gei_digital Details

Server Status (raw)

The following status information describes the underyling DDC server process.
key: valuedescription
name : server:gei_digital symbolic name for the underlying DDC server process (syslog label)
version : 2.2.8 DDC library version for the underlying DDC server process
compat : 2.2.8 server compatibility mode
started : 2024-04-16 08:01:45+0200 time of last server re-start
uptime : 3 days, 20 hours, 44 minutes, and 33 seconds time since last server re-start
nrequests : 117 total number of client requests processed by the underlying DDC server
nqueries : 110 total number of query requests processed by the underlying DDC server
nerrors : 7 total number of failed client requests (e.g. due to parse errors)
nslow : 2 total number of slow client requests and logged as such
qtavg : 852 ms running average query processing time
nworkers : 8 number of concurrent client worker thread(s)
mem : 1.48 GB memory resident set size used by the underlying DDC server (total non-swapped physical memory used)
navcachesize : 0 current size of internal navigation hint cache
corpora : 4 logical ~ 4 physical number of sub-corpora contributing to this corpus
mmap : 4 / 4 number of memory-mapped physical sub-corpora
hitstrings : serial evaluation mode for hit-string retrieval (serial or parallel)

Index Information (raw)


The physical subcorpora supplied the following collection information conforming to D* build system conventions.
gei_digital GEI-Digital (ca. 1700-1920), Georg-Eckert-Institut - Leibniz-Institut für internationale Schulbuchforschung

Basic Information

The following basic information was estimated by aggregation over all physical leaf nodes of the current corpus.
key: valuedescription
indexed : 2021-01-29 21:17:57+0100 timestamp of youngest physical sub-corpus (*._con mtime)
nfiles : 5.04 K total number of indexed files
nsources : 0 total number of source files during index compilation
nmasked : 0 total number of masked corpus files (indexed but not displayed)
ntokens : 544 M total number of indexed tokens

Version Information

The physical subcorpora supplied the following version information conforming to D* build system conventions.
key: value(s)description
build-cab : de-dta-2020-12-05 version identifier(s) for D* CAB resources used during corpus build
build-index : gei_digital-ddc-index-2021-01-29 version identifier(s) for most recent D* DDC index update
build-index.orig : gei_digital-ddc-index-2020-12-17 version identifier(s) for initial D* DDC index build
build-src : gei_digital-src-xml-2020-12-16 version identifier(s) for D* TEI-XML corpus sources
build-timestamp : 2021-01-31T21:48:21+0100 timestamp(s) of D* DDC index compilation
curator : scheel@gei.de corpus content curator(s)
maintainer : jurish@bbaw.de corpus infrastructure maintainer(s)

Token Attributes

The following token attribute indices (s_index) are available for use in single-token queries, contextual sort operators and count-key expressions.
$Token $w
yes 10.2 M token surface text
$CanonicalToken $v yes 9.34 M DTA::CAB-normalized modern equivalent wordform
$Pos $p yes 255 part-of-speech tag for the source token (typically STTS)
$Lemma $l yes 8.53 M lemma for the source token as returned by CAB+moot+TAGH; typically the default attribute for bareword queries
$Page $page yes 4.84 K page (scan) identifier, not to be confused with DDC-internal page_ counter
$WordSep $ws yes 8 Boolean attribute, '1' (one) if the source token text is immediately preceded by whitespace, otherwise '0' (zero)

Bibliographic Metadata Attributes

The following bibliographic metadata attributes and aliases (s_field) are available for use in metadata filters, metadata sort operators, and count-key expressions.
orig no ★★★★★ 0 bibliographic decsriptor for original source document (not properly searchable)
scan no ★★★★★ 0 bibliographic descriptor for scanned source document (not properly searchable)
date no ★★★★★ 0 source document date, reported as YYYY[-MM[-DD]]
page no ★★★★★ 0 page counter integer offset
author yes ★★★★ 2.16 K document author(s)
avail yes ★★★★ 4 constant: availability code matching regex [MO]R[0-9][WS]
availability yes ★★★ 4 constant: human-readable licensing conditions
basename yes ★★★★ 5.04 K basename of indexed source file (document identifier)
bibl yes ★★★★ 5.03 K human-readable (short) bibliographic reference string for display
bildungslevel yes 44 --undocumented--
collection yes ★★★★ 4 constant: symbolic label for the (sub)corpus collection
dokumenttyp yes 34 --undocumented--
editor yes 426 --undocumented--
empty no ★★ 4 constant: empty string constant
flags yes ★★★★ 4 constant: colon- or space-separated list of Boolean flags
geiclass yes 37 --undocumented--
geicode yes 299 --undocumented--
land yes 9 --undocumented--
person yes 326 --undocumented--
place yes 609 --undocumented--
ppn yes 5.04 K --undocumented--
publisher yes 1.2 K --undocumented--
schulform yes 36 --undocumented--
textClass yes ★★★★ 4 constant: colon-separated list of symbolic document genre(s), primary genre first
timestamp yes ★★★★ 30 source document timestamp in UTC ISO-8601 format {YYYY}-{MM}-{DD}T{HH}:{MM}:{SS}Z
title yes ★★★★ 4.98 K document title
unterrichtsfach yes 32 --undocumented--
url yes ★★★ 5.04 K source URL for this document, if available
zeitspanne yes 12 --undocumented--
corpus no 4 constant: deprecated, prefer flags
dtadir no 5.04 K deprecated alias for basename
textClassDWDS no 4 constant: deprecated alias for textClass

Break Collections

The following break collections (s_break) are available for use in anchor queries and the #WITHIN query operator.
sentence s 25.7 M single sentence or sentence-like unit
paragraph p 1.27 M single paragraph or paragraph-like unit
file 5.04 K single input document (e.g. article, volume)

Operator-Dependent Defaults

Query operators lacking an explicit index specification (s_index rsp. s_break) will select a default token attribute or break collection depending on the selected query operator as follows:
_ $Lemma default attribute for bareword search terms (qw_bareword, qw_set_infl)
@_ $Token default attribute for exact match search terms (qw_exact, qw_set_exact)
/_/ $Token default attribute for regular expression search terms (qw_regex, qw_prefix, qw_suffix, qw_infix, qw_prefix_set, qw_suffix_set, qw_infix_set)
%_ $Lemma default attribute for lemma search terms (qw_lemma)
. sentence default break collection for anchor queries (qw_anchor)

Term Expanders

The following term expanders (s_expander) are available for use in term expansion pipelines for bareword and set-valued term queries.
CanonicalToken default expansion chain for the $CanonicalToken attribute, typically an alias for eqlemma
Lemma default expansion chain for the $Lemma attribute, typically an alias for lemma
Lemmas typically an alias for lemmata
Lemmata typically an alias for lemmata
Pos default expansion chain for the $Pos attribute, typically an alias for case
Token default expansion chain for the $Token attribute, typically an alias for eqlemma
Utf8 default expansion chain for the $Utf8 attribute, typically an alias for eqlemma
WebLemma synchronic lemmatizer, typically an alias for lemma
cab union of eqlemma, pho, and rw expanders
case upper-/lower-case variant expander (modulo "McKinsey" et al.)
eql typically an alias for eqlemma
eqlemma finds all lemma-equivalent surface forms using an external CAB server with auxilliary vocabulary database
germanet alias for gn-asi
gn-asi maps lemmata to all GermaNet hyponyms of any depth (synonyms or subclasses)
gn-asi1 maps lemmata to all depth≤1 GermaNet hyponyms (synonyms or immediate subclasses)
gn-asi2 maps lemmata to all depth≤2 GermaNet hyponyms (synonyms, first-, or second-order subclasses)
gn-isa maps lemmata to all GermaNet hyperonyms of any depth (synonyms or superclasses)
gn-isa1 maps lemmata to all depth≤1 GermaNet hyperonyms (synonyms or immediate superclasses)
gn-isa2 maps lemmata to all depth≤2 GermaNet hyperonyms (synonyms, first-, or second-order superclasses)
gn-sub alias for gn-asi
gn-sub1 alias for gn-asi1
gn-sub2 alias for gn-asi2
gn-sup alias for gn-isa
gn-sup1 alias for gn-isa1
gn-sup2 alias for gn-isa2
gn-syn maps lemmata to immediate GermaNet synonyms (synset co-membership)
gn-syn1 maps lemmata to depth≤1 GermaNet synonyms (synset co-membership or immediate hyponyms/hyperonyms)
gn-syn2 maps lemmata to depth≤2 GermaNet synonyms (synset co-membership, first-, or second-order hyponyms/hyperonyms)
id identity expander (no-op)
infl alias for morphy
lc typically an alias for tolower
lemma default lemmatizer using external CAB server (precision-oriented, "best" lemma only, typically the default)
lemmas typically an alias for lemmata
lemmata alternative lemmatizer using external CAB server (recall-oriented, returns all known lemata)
lemmatize typically an alias for lemma
morphy built-in DDC morphy lemmatization & re-inflection (not recommended, prefer Lemma or eqlemma)
null identity expander (no-op)
openthes alias for ot-asi
openthesaurus alias for ot-asi
ot-asi maps lemmata to all OpenThesaurus hyponyms of any depth (synonyms or subclasses)
ot-asi1 maps lemmata to all depth≤1 OpenThesaurus hyponyms (synonyms or immediate subclasses)
ot-asi2 maps lemmata to all depth≤2 OpenThesaurus hyponyms (synonyms, first-, or second-order subclasses)
ot-isa maps lemmata to all OpenThesaurus hyperonyms of any depth (synonyms or superclasses)
ot-isa1 maps lemmata to all depth≤1 OpenThesaurus hyperonyms (synonyms or immediate superclasses)
ot-isa2 maps lemmata to all depth≤2 OpenThesaurus hyperonyms (synonyms, first-, or second-order superclasses)
ot-sub alias for ot-asi
ot-sub1 alias for ot-asi1
ot-sub2 alias for ot-asi2
ot-sup alias for ot-isa
ot-sup1 alias for ot-isa1
ot-sup2 alias for ot-isa2
ot-syn maps lemmata to immediate OpenThesaurus synonyms (synset co-membership)
ot-syn1 maps lemmata to depth≤1 OpenThesaurus synonyms (synset co-membership or first-order hyponyms/hyperonyms)
ot-syn2 maps lemmata to depth≤2 OpenThesaurus synonyms (synset co-membership, first-, or second-order hyponyms/hyperonyms)
pho phonetic equivalent surface forms using an external CAB server with auxilliary vocabulary database
pos-ud maps UD Pos-tags to the native corpus tagset: typically an alias for pos-ud2stts, used by CLARIN-FCS
pos-ud2stts maps UD PoS-tags to STTS
rw rewrite-equivalent surface forms using an external CAB server with auxilliary vocabulary database
sem alias for semsim
sem10 alias for LEMMA@10|semsim
sem100 alias for LEMMA@100|semsim
sem50 alias for LEMMA@50|semsim
semsim DTA::SemCloud distributional semantic index k-nearest neighbor lemmata, query as (LEMMA@K|semsim)
tolower maps input term(s) to all lower-case
toupper maps input term(s) to all upper-case
uc typically an alias for toupper
web typically an alias for WebLemma
www typically an alias for WebLemma
D* OpenSearch API version 0.58 0.201522 sec Imprint · Privacy
Projekt GEI-Digital-2020
Collection: GEI-Digital
Corpus sources provided by the Georg-Eckert-Institut - Leibniz-Institut für internationale Schulbuchforschung.
Corpus processing and infrastructure development by the Zentrum für digitale Lexikographie der deutschen Sprache at the Berlin-Brandenburg Academy of Sciences and Humanities.