Protolanguage

A common language

Typcally, the proto-language is not known directly. It is by definition a linguistic reconstruction formulated by applying the comparative method to a group of languages featuring similar characteristics. The tree is a statement of similarity and a hypothesis that the similarity results from descent from a common language.

The comparative method, a process of deduction, begins from a set of characteristics, or characters, found in the attested languages. If the entire set can be accounted for by descent from the proto-language, which must contain the proto-forms of them all, the tree, or phylogeny, is regarded as a complete explanation and by Occam’s razor, is given credibility. More recently such a tree has been termed “perfect” and the characters labeled “compatible.”

The hypotheses of highest compatibility

No trees but the smallest branches are ever found to be perfect. Typically credibility is given to the hypotheses of highest compatibility. The differences in compatibility must be explained by various applications of the Wave model. The level of completeness of the reconstruction achieved varies, depending on how complete the evidence is from the descendant languages and on the formulation of the characters by the linguists working on it. Not all characters are suitable for the comparative method. For example, lexical items that are loans from a different language do not reflect the phylogeny to be tested, and if used will detract from the compatibility. Getting the right dataset for the comparative method is a major task in historical linguistics.

Some universally accepted proto-languages are Proto-Indo-European, Proto-Uralic, and Proto-Dravidian. In rare instances compared to the hundreds of proto languages, scant evidence may be found that increases the probability of a proto-language having existed. For example, the limited vocabulary of the Elder Futhark, the language of ancient runic inscriptions, can be reconstructed from the writing system to support the Proto-Germanic hypothesis.

In a few fortuitous instances, which have been used to verify the method and the model, a literary history exists from as early as a few millenia ago. The descent can be traced in detail. The daughter languages are attested in surviving texts. For example, Latin is the proto-language of the Romance language family, which includes such modern languages as French, Italian, Portuguese, Romanian and Spanish. Likewise, Proto-Norse, the ancestor of the modern Scandinavian languages, is attested, albeit in fragmentary form, in the Younger Futhark. Although there are no very early Indo-Aryan inscriptions, the Indo-Aryan languages of modern India all go back to Vedic Sanskrit (or dialects very closely related to it), which has been preserved in texts accurately handed down by parallel oral and written traditions for many centuries.

The first person to offer systematic reconstructions of an unattested proto-language was August Schleicher. He did so for Proto-Indo-European in 1861.