Transitional Probability

About the function

Transitional probability is a measure of how likely a symbol will appear, given a preceding or succeeding symbol. For a bigram AB, its forward transitional probability is the likelihood of B given A, and its backward transitional probability is the likelihood of A given B [Pelucci2009]. The measurement can be used to predict word or morpheme boundaries in speech (see [Saffran1996a], [Saffran1996b] and [Daland2011]). Two symbols with a low transitional probability are unlikely to co-occur, which would predict that that a word or morpheme boundary is likely to exist between them. Conversely, two symbols with a high transitional probability are likely to co-occur, and are predicted to exist within one word or morpheme. For example, the symbols [n] and [d] may have a high forward transitional probability in a corpus of English, because they appear within words like [saʊnd] ‘sound’ or [ʌndɚ] ‘under’. In the same corpus, the symbols [d] and [f] may have a low forward transitional probability, because the sequence [df] only occurs across word boundaries, such as in [ænd#fɹʌm] ‘and from’.

Note that because corpora in PCT are treated as lists of words in isolation, even if they were created from running transcriptions, transitional probability calculations are always actually within words in PCT.

Method of calculation

In PCT, transitional probability is calculated on the segment level, and it is possible to calculate both forward and backward TP. Given two segments \(a\) and \(b\), occuring in the order \(ab\),

Forward transitional probability:

\(P(b|a) = \frac{P(ab)}{P(a)}\)

Backward transitional probability:

\(P(a|b) = \frac{P(ab)}{P(b)}\)

where \(P(ab)\) is the probability of the bigram \(ab\), and \(P(a)\) and \(P(b)\) are the probabilities of the segments \(a\) and \(b\).

A toy example

Consider the following corpus:

Spelling

Transcription

Frequency

mata

m.ɑ.t.ɑ

2

nama

n.ɑ.m.ɑ

4

ʃi

ʃ.i

6

Using type frequencies, the probability of the bigram ɑm is:

\(P(am) = \frac{n_{am}}{n_{bigrams}} = \frac{1}{7}\)

i.e., the frequency of the bigram am divided by the total number of bigrams in the corpus (assuming only segments count toward bigrams; see more in the section on word boundaries below).

Using token frequencies, the probability is:

\(P(am) = \frac{n_{am}}{n_{bigrams}} = \frac{4}{24}\)

The probability of the individual segments are found by finding the number of bigrams that start with the first segment, when looking for the first, or end with the second segment, when looking for the second. For [m] and [ɑ]:

\(P(a) = \frac{n_{a\_}}{n_{bigrams}} = \frac{2}{7}\) with type frequencies, or \(\frac{6}{24}\) with token frequencies.

\(P(m) = \frac{n_{\_m}}{n_{bigrams}} = \frac{1}{7}\) with type frequencies, or \(\frac{4}{24}\) with token frequencies.

Given the bigram am, the forward TP is:

\(P(m|a) = \frac{P(am)}{P(a)} = \frac{1/7}{2/7} = \frac{1}{2}\) with type frequencies, or \(\frac{4/24}{6/24} = \frac{2}{3}\) with token frequencies.

The backward TP is:

\(P(a|m) = \frac{P(am)}{P(m)} = \frac{1/7}{1/7} = 1\) with type frequencies, or \(\frac{4/24}{4/24} = 1\) with token frequencies.

In this corpus, the segment m will occur after the segment ɑ 50% of the time given type frequencies or 67% of the time given token frequencies. Meanwhile, ɑ is certain to appear before the segment m (if m has any segment before it, i.e., is not word-initial).

For more on this method, see [Anghelescu2016].

Word Boundaries

In PCT, word boundaries can be set to occur once at the end of every word, to occur on both sides of a word, or to be ignored (as they were in the above examples). The first option is the default setting.

Assuming a single boundary at the end of every word:

Spelling

Transcription

Frequency

mata

m.ɑ.t.ɑ.#

2

nama

n.ɑ.m.ɑ.#

4

ʃi

ʃ.i.#

6

The probability of the bigram am is:

\(P(am) = \frac{n_{am}}{n_{bigrams}} = \frac{1}{10}\) with type frequencies, and \(\frac{4}{36}\) with token frequencies.

The probabilities of the individual symbols are:

\(P(a) = \frac{n_{a\_}}{n_{bigrams}} = \frac{4}{10}\) with type frequencies, and \(\frac{12}{36}\) with token frequencies.

\(P(m) = \frac{n_{\_m}}{n_{bigrams}} = \frac{1}{10}\) with type frequencies, and \(\frac{4}{36}\) with token frequencies.

Given single word boundaries, the forward TP of the bigram am is:

\(P(m|a) = \frac{P(am)}{P(a)} = \frac{1/10}{4/10} = \frac{1}{4}\) with type frequencies, or \(\frac{4/36}{12/36} = \frac{1}{3}\) with token frequencies.

The backward TP is:

\(P(a|m) = \frac{P(am)}{P(m)} = \frac{1/10}{1/10} = 1\) with type frequencies, or \(\frac{4/36}{4/36} = 1\) with token frequencies.

Assuming boundaries on each side of a word:

Spelling

Transcription

Frequency

mata

#.m.ɑ.t.ɑ.#

2

nama

#.n.ɑ.m.ɑ.#

4

ʃi

#.ʃ.i.#

6

The probability of the bigram am is:

\(P(am) = \frac{n_{am}}{n_{bigrams}} = \frac{1}{13}\) with type frequencies, and \(\frac{4}{48}\) with token frequencies.

The probabilities of the individual symbols are:

\(P(a) = \frac{n_{a\_}}{n_{bigrams}} = \frac{4}{13}\) with type frequencies, and \(\frac{12}{48}\) with token frequencies.

\(P(m) = \frac{n_{\_m}}{n_{bigrams}} = \frac{2}{13}\) with type frequencies, and \(\frac{6}{48}\) with token frequencies.

Given single word boundaries, the forward TP of the bigram am is:

\(P(m|a) = \frac{P(am)}{P(a)} = \frac{1/13}{4/13} = \frac{1}{4}\) with type frequencies, or \(\frac{4/48}{12/48} = \frac{1}{3}\) with token frequencies.

The backward TP is:

\(P(a|m) = \frac{P(am)}{P(m)} = \frac{1/13}{2/13} = \frac{1}{2}\) with type frequencies, or \(\frac{4/48}{6/48} = \frac{2}{3}\) with token frequencies.

The first example in this section was calculated ignoring word boundaries.

Calculating transitional probability in the GUI

As with most analysis functions, a corpus must first be loaded (see Loading in corpora). Once a corpus is loaded:

1. Getting started: Choose “Analysis” / “Calculate transitional probability…” from the top menu bar.

2. Bigram selection: To select segment pairs, click on the “Add bigram” button to open the bigram_selector dialogue box. Note that the order of the bigram matters for calculating transitional probability.

3. Direction: Transitional probability can be calculated based on the presence of either the first or second segment. The labels “P(B|A)” and “P(A|B)” correspond to the column labels “A” and “B” on the Bigrams table.

4. Word boundary: Select an option for word boundary. The default is to assume that there is only one boundary per word, and that it is in final position (as is assumed in [Goldsmith2012] with respect to Mutual Information calculations). This is based on the assumption that in running text, the final boundary of word 1 will be the initial boundary of word 2, so that there is no need to have two boundaries per word. Select “Keep both word boundaries” to have boundaries on both sides, or “Ignore all word boundaries” to ignore all word boundaries in the calculation.

5. Pronunciation variants: If the corpus contains multiple pronunciation variants for lexical items, select which strategy should be used. For details, see Pronunciation Variants.

6. Tier: Select which tier transitional probability will be calculated from. The default is transcription, but other tiers can be created in order to isolate or group together various phonemes. See Creating new tiers in the corpus for details on creating and using tiers.

7. Type or token frequency: Transitional probability can be calculated using either type or token frequencies, provided that the loaded corpus includes both frequency measures (see Required format of corpus).

8. Minimum frequency: It is possible to set a minimum token frequency for including words in the calculation. This allows easy exclusion of rare words. To include all words in the corpus, regardless of their token frequency, set the minimum frequency to 0, or leave the field blank. Note that if a minimum frequency is set, all words below that frequency will be ignored entirely for the purposes of calculation.

Classes and functions

For further details about the relevant classes and functions in PCT’s source code, please refer to Transitional probability.