MODELLING THE POPULATION

JurisdictionAustralia
MODELLING THE POPULATION

A simple model.............................................................................................. [80A.1700]

Testing the model .......................................................................................... [80A.1720]

Chi-square test .............................................................................................. [80A.1740]

Exact tests..................................................................................................... [80A.1760]

Conclusions from testing............................................................................... [80A.1780]

Subpopulation theory .................................................................................... [80A.1800]

Modelling subpopulations ............................................................................. [80A.1820]

Heterozygote probability ............................................................................... [80A.1840]

Homozygote probability................................................................................. [80A.1860]

Subpopulation theory and linkage between loci ........................................... [80A.1880]

What value of ? to use.................................................................................. [80A.1900]

Non-concordances ........................................................................................ [80A.1920]

Considering drop-out in an assumed single source profile .......................... [80A.1940]

[80A.1700] A simple model

The simplest way to estimate the likelihood that a person selected at random from the population would have a particular DNA profile is to multiply allele frequencies to obtain locus genotype frequencies and to multiply these to estimate the frequency of the whole profile (product rule). This procedure will be exactly correct when allele and locus frequencies are statistically independent. That is, the occurrence in an individual of a particular allele makes it no more and no less likely that the individual would possess any other allele at that locus or any other locus.

The random assortment of alleles both within and between loci occurs in a population that is genetically homogeneous. For the population to be homogeneous, the following set of conditions must be true: there are no mutations (which generate new alleles); no migration of people (who move alleles in or out of the population); no natural selection (favouring the inheritance of some alleles over others); people choose their mates at random; and the population must also be infinitely large. Such a population is said to display Hardy-Weinberg Equilibrium, because the relative frequencies of the genotypes are the same in each succeeding generation.

Until the 1990s, it had been assumed by most forensic scientists that this model was valid. Databases from which allele frequencies are obtained are composed of individuals (persons involved in cases examined, laboratory staff, blood bank donors, etc) who are not selected on any genetic basis and, therefore, might be expected to represent a cross-section of the general population. Although it is recognised that the general population is composed of a number of different ethnic groups, there is much information to suggest that, at least in Western cosmopolitan societies, there is sufficient migration and intermarriage between groups to even out most genetic differences between them: Federal Bureau of Investigation (1993); Price (1993).

After Lewontin and Hartl (1991) challenged this assumption, some in the forensic community decided to introduce statistical procedures to test this model.

[80A.1720] Testing the model

If the model did not hold true, there should be two consequences, it was argued: within each locus, the genotypes may not display Hardy-Weinberg proportions, and pairs of loci may display linkage disequilibrium. Departures from Hardy-Weinberg Equilibrium could be examined using statistical tests of significance, such as the Exact test and the Chi-square test. Linkage disequilibrium (a state in which there is statistical dependence between alleles at different loci and the association of these alleles is unrelated to their physical linkage) could also be detected using the Exact test.

Statistical tests are used to compare the observed data with what would be expected to be observed if the Null hypothesis (H0) were true. The Null hypothesis typically states that the data fit a model perfectly. In this case, the model that is being tested is the Hardy-Weinberg model; the model predicts that allele frequencies are independent; the population displays Hardy-Weinberg Equilibrium and Linkage Equilibrium. The goodness-of-fit between the observed numbers and the expected numbers if the model were true can be assessed quantitatively by means of a test statistic (eg, X2; see below). If the population were in Hardy-Weinberg Equilibrium (ie, the Null hypothesis), the X2 value would be zero. The test statistic is compared with a probability distribution, which indicates how likely it would be that the observed data or a more extreme set would occur, if the Null hypothesis were true. This gives a probability (p) value. A small p value casts doubt on H0. p = 0.05 is regarded as representing a "significant" difference or departure such that the observed difference between observation and theoretical prediction is so large that it is unlikely to be obtained if the theory or hypothesis were true. Therefore, the Null hypothesis should be rejected.

However, several authors (Evett and Buckleton, 1996; Evett and Weir, 1998; Buckleton et al, 2001) have pointed out that there are serious problems with the approach involving statistical testing for disequilibrium:

1 The null hypothesis cannot be true in real human populations, because some of the necessary conditions do not hold: we know that mutations have occurred to generate the various alleles at each locus; migration occurs; no population is infinitely large; and people do not choose their mates at random;
2 The alternative hypothesis is infinitely vague: no one knows how much statistical association there is between alleles nor what the practical consequences of the association are; and
3 A small p value does not mean there are any practical consequences; a high p value never implies independence.

Nevertheless, most laboratories employ some kind of statistical testing as part of the validation of their databases.

[80A.1740] Chi-square test

This procedure is a "Goodness of Fit" test, and provides a simple means of assessing how well a set of observations agrees with the results that would be expected if a particular hypothesis were true. However, it is not regarded as being very powerful as a means of providing evidence that genotypes in a population are not in Hardy-Weinberg proportions.

Typically, the actual genotype frequencies that were observed are compared with the genotype frequencies that would be expected under the Hardy-Weinberg model. By applying the test to a sample (database) of genotypes observed for a genetic typing system, the results can be assessed to see whether the sample is consistent with the null hypothesis that the population is in Hardy-Weinberg equilibrium. The test statistic that must be calculated is designated X2:

where Obs are the observed counts (frequencies) of each genotype, and Exp are the counts (frequencies) that would be expected if the Hardy-Weinberg model described the population. The expression

must be calculated for each genotype. £ denotes that the values for all genotypes must then be added together. The greater the difference between the observed and expected frequencies, the larger the value of the test statistic X2 and the lower the probability that the Hardy-Weinberg model accurately describes the population. This test statistic will have a known distribution called the Chi-square if the null hypothesis is true.

To decide whether the difference is large enough for the model must be discarded, the value of the test statistic is compared with a theoretical Chi-square distribution. The appropriate Chi-square distribution varies somewhat depending on the number of independent variables (called Degrees of Freedom, df). For a genetic marker system, the degrees of freedom can be calculated as follows:

df = no. of genotype classes - no. of alleles

For example, for the locus D3S1358 in a Victorian Caucasian population sample, there are eight different alleles, giving rise to 36 possible genotype classes and thus 28 degrees of freedom.

Tables of Chi-square distributions are published in standard statistical textbooks and are available from many computer packages such as Excel™. For a given number of degrees of freedom, the larger the test statistic, the smaller the associated probability (p value). According to statistical convention, p values smaller than 0.05 (5%) are regarded as significant. Thus X2 values larger than the value given in the Chi-square table for p = 0.05 would be unlikely to be observed if the model accurately described reality. Such a result is described as "significant" in classical hypothesis-testing parlance. In other words, either the null hypothesis is false or a rare event has occurred.

There may be several reasons for significant differences apart from population genetic issues. One of the most likely is a systematic typing error such that genotypes are not correctly recorded (Devlin, Risch and Roeder, 1990), and the Chi-square test provides a useful test that this has not occurred. Another possible reason for indications of significant differences is that the sample size is too small. The Chi-square test is known to give spurious indications of significance if the expected counts are very small. This difficulty can be overcome by increasing the sample size or by pooling genotype classes with small expected frequencies to raise the expected counts in the combined class.

[80A.1760] Exact...

Get this document and AI-powered insights with a free trial of vLex and Vincent AI

Get Started for Free

Unlock full access with a free 7-day trial

Transform your legal research with vLex

  • Complete access to the largest collection of common law case law on one platform

  • Generate AI case summaries that instantly highlight key legal issues

  • Advanced search capabilities with precise filtering and sorting options

  • Comprehensive legal content with documents across 100+ jurisdictions

  • Trusted by 2 million professionals including top global firms

  • Access AI-Powered Research with Vincent AI: Natural language queries with verified citations

vLex

Unlock full access with a free 7-day trial

Transform your legal research with vLex

  • Complete access to the largest collection of common law case law on one platform

  • Generate AI case summaries that instantly highlight key legal issues

  • Advanced search capabilities with precise filtering and sorting options

  • Comprehensive legal content with documents across 100+ jurisdictions

  • Trusted by 2 million professionals including top global firms

  • Access AI-Powered Research with Vincent AI: Natural language queries with verified citations

vLex

Unlock full access with a free 7-day trial

Transform your legal research with vLex

  • Complete access to the largest collection of common law case law on one platform

  • Generate AI case summaries that instantly highlight key legal issues

  • Advanced search capabilities with precise filtering and sorting options

  • Comprehensive legal content with documents across 100+ jurisdictions

  • Trusted by 2 million professionals including top global firms

  • Access AI-Powered Research with Vincent AI: Natural language queries with verified citations

vLex

Unlock full access with a free 7-day trial

Transform your legal research with vLex

  • Complete access to the largest collection of common law case law on one platform

  • Generate AI case summaries that instantly highlight key legal issues

  • Advanced search capabilities with precise filtering and sorting options

  • Comprehensive legal content with documents across 100+ jurisdictions

  • Trusted by 2 million professionals including top global firms

  • Access AI-Powered Research with Vincent AI: Natural language queries with verified citations

vLex

Unlock full access with a free 7-day trial

Transform your legal research with vLex

  • Complete access to the largest collection of common law case law on one platform

  • Generate AI case summaries that instantly highlight key legal issues

  • Advanced search capabilities with precise filtering and sorting options

  • Comprehensive legal content with documents across 100+ jurisdictions

  • Trusted by 2 million professionals including top global firms

  • Access AI-Powered Research with Vincent AI: Natural language queries with verified citations

vLex

Unlock full access with a free 7-day trial

Transform your legal research with vLex

  • Complete access to the largest collection of common law case law on one platform

  • Generate AI case summaries that instantly highlight key legal issues

  • Advanced search capabilities with precise filtering and sorting options

  • Comprehensive legal content with documents across 100+ jurisdictions

  • Trusted by 2 million professionals including top global firms

  • Access AI-Powered Research with Vincent AI: Natural language queries with verified citations

vLex