Attributes and Methods in LexicalRichness

This addendum exposes the underlying lexicalrichness measures from attributes and methods in the LexicalRichness class.

TTR: Type-Token Ratio (Chotlos 1944, Templin 1957)

lexicalrichness.LexicalRichness.ttr()

Type-token ratio (TTR) computed as t/w, where t is the number of unique terms/vocab, and w is the total number of words. (Chotlos 1944, Templin 1957)

Returns: Type-token ratio
Return type: Float

RTTR: Root Type-Token Ratio (Guiraud 1954, 1960)]

lexicalrichness.LexicalRichness.rttr()

Root TTR (RTTR) computed as t/sqrt(w), where t is the number of unique terms/vocab, and w is the total number of words. Also known as Guiraud’s R and Guiraud’s index. (Guiraud 1954, 1960)

Returns: Root type-token ratio
Return type: FLoat

CTTR: Corrected Type-Token Ratio (Carrol 1964)

lexicalrichness.LexicalRichness.cttr()

Corrected TTR (CTTR) computed as t/sqrt(2 * w), where t is the number of unique terms/vocab, and w is the total number of words. (Carrol 1964)

Returns: Corrected type-token ratio
Return type: Float

Herdan: Herdan’s C (Herdan 1960, 1964)

lexicalrichness.LexicalRichness.Herdan()

Computed as log(t)/log(w), where t is the number of unique terms/vocab, and w is the total number of words. Also known as Herdan’s C. (Herdan 1960, 1964)

Returns: Herdan’s C
Return type: Float

Summer: Summer (Summer 1966)

lexicalrichness.LexicalRichness.Summer()

Computed as log(log(t)) / log(log(w)), where t is the number of unique terms/vocab, and w is the total number of words. (Summer 1966)

Returns: Summer
Return type: Float

Dugast: Dugast (Dugast 1978)

lexicalrichness.LexicalRichness.Dugast()

Computed as (log(w) ** 2) / (log(w) - log(t)), where t is the number of unique terms/vocab, and w is the total number of words. (Dugast 1978)

Returns: Dugast
Return type: Float

Maas: Maas (Maas 1972)

lexicalrichness.LexicalRichness.Maas()

Maas’s TTR, computed as (log(w) - log(t)) / (log(w) * log(w)), where t is the number of unique terms/vocab, and w is the total number of words. Unlike the other measures, lower maas measure indicates higher lexical richness. (Maas 1972)

Returns: Maas
Return type: Float

yulek: Yule’s K (Yule 1944, Tweedie and Baayen 1998)

yulei: Yule’s I (Yule 1944, Tweedie and Baayen 1998)

Herdan’s Vm (Herdan 1955, Tweedie and Baayen 1998)

Simpson’s D (Simpson 1949, Tweedie and Baayen 1998)

msttr: Mean Segmental Type-Token Ratio (Johnson 1944)

lexicalrichness.LexicalRichness.msttr(self, segment_window=100, discard=True)

Mean segmental TTR (MSTTR) computed as average of TTR scores for segments in a text.

Split a text into segments of length segment_window. For each segment, compute the TTR. MSTTR score is the sum of these scores divided by the number of segments. (Johnson 1944)