Linguistic Attractors: The Cognitive Dynamics of Language Acquisition and Change
John Benjamins; Human Cognitive Processing, vol 2., 1999. Hb xv, 375 pp. 90 272 2354 8
Overview:
This interdisciplinary linguistic attractor model portrays language processing as linked sequences of fractal sets, and examines the changing dynamics of such sets for individuals as well as the speech community they comprise. Its motivation stems from human anatomic constraints and several artificial neural network approaches. It uses general computation theory to:
It introduces techniques to isolate and measure attractors, and to interpret their stability and relative content within a system. Important results include the capability to distinguish the sequence of related sound changes, and to make point-to-point comparisons of different texts using common metrics. Other techniques allow quantifiable ambiguity landscapes illustrating the forces that propel different languages in different directions.
Sample Results:
A Related Puzzle:
The Correlation Function
Because we will make such frequent use of the function, we need to look at the actual expression and understand how it works. The correlation function C(r), for varying distances r, is given by
C(r) = (1/N2)3q(r - 2Xi - Xj 2)
[The summation is over i and j.] N is the number of data points, r is the radius of a hypersphere (i.e., a set of points equidistant from a center, but set in any number of dimensions, not just 1–a line, 2–a circle, or 3–a standard sphere).
The hypersphere is centered successively at all points Xi, and the index i runs from 1 to N. The term | Xi - Xj | is then the distance between Xi and successive points Xj, where the index j also runs from 1 to N. We use the Heaviside function q, where q(x) = 1 for x > 0 and q(x) = 0 otherwise. Thus, when the distance is less than r, we get a 1, and otherwise we get a 0. The summation counts all of these 1's up, and we then divide by N2 because that is the total number of one-to-one correlations between points. (Imagine a matrix with the i's from 1 to N down the side, and the j's from 1 to N across the top. There are N2 boxes in the matrix, and each box would have a distance measurement in it, from each i to each j.) We need to do that because a correlation is a normed measure, much like a probability. We say that two things are uncorrelated if the correlation is zero, and completely correlated if they correlation is one.
We estimate the dimension of the attractor by plotting the logarithm of the function against the logarithm of the radius and taking the slope of the linear portions of the curves. To understand this, consider the distance probe from each point. Assume we want to examine the space in terms of a parameter with measure a. If the space is a simple line, we will look into pieces of the space with size r/a. If the space is two-dimensional, we will be looking at an expanding circle, and the pieces will have a size proportional to (r/a)2. If it is three-dimensional, we will be looking into expanding spheres, with pieces proportional to (r/a)3. For points distributed uniformly through the space, we would then expect the correlation measure C(r) to be proportional to rn for an n-dimensional space. For an attractor embedded in the space, the measure would be smaller, but still proportional to rd, where d is the fractal dimension of the attractor. Consequently, we take the logarithm of the two quantities to find d. Since generally lnC(r) . dln(r), the slope will converge to d when we have probed the attractor with enough test parameters. (Any logarithmic base would do; for information-related problems the base of choice is often 2, because we try to resolve all operations to yes-no decisions. Here we use the natural logarithm for convenience.)
Linguistic Attractors, pages 29-30.
Downloadable Excel Workbook: http://attractorconcepts.com/Correlation_Function.xls
This file allows the computation of the correlation function for up to 500 observations of up to 10 parameters for each. It is broken into two steps: calculation of the distances in the phase space, and then calculation of the correlation measures. One must paste the values from the distance page onto the correlation page because the correlation formulae count all cells in the matrix, and it is necessary to adjust the size of that matrix manually in Excel. This file provides a worked example using data for one of the vowels (/F/) from the Northern Cities Vowel shift discussed below. It shows how a large number of observations work within the workbook.
WARNING: This file is very large and takes several minutes to load on a 56K modem. Without high speed web access, it is probably best to open the file from within Excel.
Correlation Measures and Signatures for Vowel Changes
Data derived from Willian Labov, Principles of Linguistic Change
This chart shows how to read a graph of the correlation function applied to vowels involved in a sound change. The sound change viewed here is the Northern Cities Vowel Shift, an on-going change associated with white populations in cities in the northeastern and northern mid-western United States.

The correlation function measures the content of a fractal set by the slope of the straight portion of the curve. Steep slopes show high information content. Buffer regions show regions of stability around attractors, and thus show elements of a system that are not participating in a change. Kinks in the curve at long range show the limit for correlations among elements in the system. High information at long range generally indicates a change that has entered the system a relatively long time ago.

Here, the greatest content belongs to the oldest member of a vowel change in the Northern Cities group in the US--the vowel /F/. The remaining vowels participating in the change can be read from bottom to top on the graph in the order in which they began to participate.
The upglide /iy/ is stable in the Northern Cities shift. This graph shows the signature for a stable element in a system: a buffer indicated by the roughly horizontal portion of the graph to the left and center. This graph also shows the convergence to a stable curve by adding the effects of controlling factors one by one.
Correlation Graphs for Case in Old English
Beowulf was written centuries before Elfric composed his homilies. Case as set down in the documents showed little difference in spelling. However, usage, as depicted on these graphs, shifted considerably.
Both show stable case systems, as indicated by the large buffers on the left. The dative carried the most content for both, as shown by the steepest slope and largest vertical drop on the graphs. However, the accusative carried a much greater information content in Beowulf, despite its easy confusion with the nominative in many situations. By Elfric's time, this ambiguity seems to have led to simplification in the system.
New Word Order Attractor in Old English
Beowulf and Elfric differ considerably in their preference for word order in clauses. These graphs clearly show the contrast between a relatively free variation of verb positions in Beowulf and a strong preference for verb second main clauses and verb final subordinate clauses in Elfric.
In these graphs, M stands for main clause, A for an auxiliary clause, X for a clause functioning as the subject of the sentence, Y for a clause functioning as the direct object of the main clause verb, and Z for a clause functioning as the indirect object of the main clause verb. Protases are the "if" clauses in conditional sentences.
The Mood Systems in Old English and Old High German
Old English and Old High German both had mood systems that contained infinitive, indicative, and subjunctive forms. These graphs show very similar signatures for these moods, with the subjunctive carrying the highest information content. The two systems differed, however, in that the verbal conjugation for Old High German was far less ambiguous in signaling subjunctive forms as distinct from the corresponding indicative. Old English compensated, as the graph shows, by using modal auxiliary verbs (might, could, should, would, etc.). The two attractor sets showed nearly identical information content. Interestingly, High German later adopted a very similar modal auxiliary system of its own as the conjugation became more ambiguous.
The Subjunctive Attractor in High German
Like the upglides in the Northern Cities Vowel Shift, mood attractors in High German were stable elements in the modal system. This graph, like the graphs comparing the Old High German and Old English modal systems, shows the signature for a stable element. Also like the graph for /F/ in the Northern Cities Vowel Shift, we can see that the contributions of controlling factors can be isolated by building the convergent curve, adding factors one by one.
Here, the variables are different, reflecting the difference between the two types of system. The reasons for selecting these variables, their valuation, and interpretation of the curves are all explored at much greater length in Linguistic Attractors. The correlation function is also explained in greater detail in the book.
Places to Buy Linguistic Attractors
Amazon.com: http://www.amazon.com
Barnes and Noble: http://www.bn.com
Borders: http://www.borders.com