In genetics, Heritability is the proportion of phenotypic variation in a population that is attributable to genetic variation among individuals. Variation among individuals may be due to genetic and/or environmental factors. Heritability analyses estimate the relative contributions of differences in genetic and non-genetic factors to the total phenotypic variance in a population.
Consider a statistical model for describing some particular phenotype:
- Phenotype (P) = Genotype (G) + Environment (E).
- Var(P) = Var(G) + Var(E) + 2 Cov(G,E).
- H^2 = \frac .
The parameter H2 is the broad-sense heritability and reflects all possible genetic contributions to a population's phenotypic variance. Included are effects due to allelic variation (additive variance), dominance variation or which act epistatically (multi-genic interactions), as well as maternal and paternal effects, where individuals are directly affected of their parents' phenotype (such as with milk production in mammals).
These additional terms can be included in genetic models. For example, the simplest genetic model involves a single locus with two alleles that effect some quantitative phenotype, as shown by + in Figure 1. We can calculate the linear regression of phenotype on the number of B alleles (0, 1, or 2), which is shown as the Linear Effect line. For any genotype, BiBj, the expected phenotype can then be written as the sum of the overall mean, a linear effect, and a dominance deviation:
- P_ = \mu + \alpha_i + \alpha_j + d_ = Population mean + Additive Effect (a_=\alpha_i+\alpha_j) + Dominance Deviation (d_).
The additive genetic variance is the weighted average of the squares of the additive effects:
- Var(A) = f(bb)a^2_+f(Bb)a^2_+f(BB)a^2_,
where f(bb)a_+f(Bb)a_+f(BB)a_ = 0.
There is a similar relationship for variance of dominance deviations:
- Var(D) = f(bb)d^2_+f(Bb)d^2_+f(BB)d^2_,
where f(bb)d_+f(Bb)d_+f(BB)d_ = 0.
Narrow-sense heritability is defined as
- h^2 = \frac
and quantifies only the portion of the phenotypic variation that is additive (allelic) by nature (note upper case H2 for broad sense, lower case h2 for narrow sense). When interested in improving livestock via artificial selection, for example, knowing the narrow-sense heritability of the trait of interest will allow predicting how much the mean of the trait will increase in the next generation as a function of how much the mean of the selected parents differs from the mean of the population from which the selected parents were chosen. The observed response to selection leads to an estimate of the narrow-sense heritability (called realized heritability).
Estimating heritabilityEstimating heritability is not a simple process, since only P can be observed or measured directly. Measuring the genetic and environmental variance requires various sophisticated statistical methods. These methods give better estimates when using data from closely related individuals - such as brothers, sisters, parents and offspring, rather than from more distantly related ones. The standard error for heritability estimates are generally very poor unless the dataset is large.
Calculating the strength of selection, S (the difference in mean trait between the population as a whole and the selected parents of the next generation, also called the selection differential ) and response to selection R (the difference in offspring and whole parental generation mean trait) in an artificial selection experiment will allow calculation of realized heritability as the response to selection relative to the strength of selection, h2=R/S as in Fig. 3.
Comparison of close relativesIn the comparison of relatives, we find that in general,
h^2 = \frac = \frac where r can be thought of as the coefficient of relatedness, b is the coefficient of regression and t the coefficient of correlation.
Heritability may be estimated by comparing parent and offspring traits (as In Fig. 4). The slope of the line (0.57) approximates the heritability of the trait when offspring values are regressed against the average trait in the parents. If only one parent's value is used then heritability is twice the slope. (note that this is the source of the term "regression", since the offspring values always tend to regress to the mean value for the population, i.e., the slope is always less than one).
Full-sib comparisonFull-sib designs compare phenotypic traits of siblings that share a mother and a father with other sibling groups. The estimate of the sibling phenotypic correlation is an index on familiality which is equal to half the additive genetic variance plus the common environment variance when there is only additive gene action.
Half-sib comparisonHalf-sib designs compare phenotypic traits of siblings that share one parent with other sibling groups. correlation between MZ and DZ twins, h2=2(r(MZ)-r(DZ)). The effect of shared environment, c2, contributes to similarity between siblings due to the commonality of the environment they are raised in. Shared environment is approximated by the DZ correlation minus half heritability, which is the degee to which DZ twins share the same genes, c2=DZ-1/2h2. Unique environmental variance, e2, reflects the degree to which identical twins raised together are dissimilar, e2=1-r(MZ).
The classical twin study has been severely criticized and is used less and less frequently nowadays.
Large, complex pedigrees
Analysis of variance methods of estimationThe second set of methods of estimation of heritability involves ANOVA and estimation of variance components.
Basic modelWe use the basic discussion of Kempthorne (1957 ). Considering only the most basic of genetic models, we can look at the quantitative contribution of a single locus with genotype Gi as
y_i = \mu + g_i + e
g_i is the effect of genotype Gi
and e is the environmental effect.
Consider an experiment with a group of sires and their progeny from random dams. Since the progeny get half of their genes from the father and half from their (random) mother, the progeny equation is
z_i = \mu + \fracg_i + e
Intraclass correlationsConsider the experiment above. We have two groups of progeny we can compare. The first is comparing the various progeny for an individual sire (called within sire group). The variance will include terms for genetic variance (since they did not all get the same genotype) and environmental variance. This is thought of as an error term.
The second group of progeny are comparisons of means of half sibs with each other (called among sire group). In addition to the error term as in the within sire groups, we have an addition term due to the differences among different means of half sibs. The intraclass correlation is
- corr(z,z') = corr(\mu + \fracg + e, \mu + \fracg + e') = \fracV_g ,
The ANOVAIn an experiment with n sires and r progeny per sire, we can calculate the following ANOVA, using V_g as the genetic variance and V_e as the environmental variance:
The \fracV_g term is the intraclass correlation among half sibs. We can easily calculate H^2 = \frac = \frac. The Expected Mean Square is calculated from the relationship of the individuals (progeny within a sire are all half-sibs, for example), and an understanding of intraclass correlations.
Model with additive and dominance terms
For a model with additive and dominance terms, but not others, the equation for a single locus is
- y_ = \mu + \alpha_i + \alpha_j + d_ + e,
\alpha_i is the additive effect of the ith allele, \alpha_j is the additive effect of the jth allele, d_ is the dominance deviation for the ijth genotype, and e is the environment.
Experiments can be run with a similar setup to the one given in Table 1. Using different relationship groups, we can evaluate different intraclass correlations. Using V_a as the additive genetic variance and V_d as the dominance deviation variance, intraclass correlations become linear functions of these parameters. In general,
- Intraclass correlation = r V_a + \theta V_d,
where r and \theta are found as
Some common relationships and their coefficients are given in Table 2.
Larger modelsWhen a large, complex pedigree is available for estimating heritability, the most efficient use of the data is in a restricted maximum likelihood (REML) model. The raw data will usually have three or more datapoints for each individual: a code for the sire, a code for the dam and one or several trait values. Different trait values may be for different traits or for different timepoints of measurement. The currently popular methodology relies on high degrees of certainty over the identities of the sire and dam; it is not common to treat the sire identity probabilistically. This is not usually a problem, since the methodology is rarely applied to wild populations (although it has been used for several wild ungulate and bird populations), and sires are invariably known with a very high degree of certainty in breeding programmes. There are also algorithms that account for uncertain paternity.
The pedigrees can be viewed using programs such as Pedigree Viewer http://www-personal.une.edu.au/~bkinghor/pedigree.htm, and analysed with programs such as ASReml, VCE http://vce.tzv.fal.de/index.pl or WOMBAT http://agbu.une.edu.au/~kmeyer/wombat.html.
Response to Selection
In selective breeding of plants and animals, the expected response to selection can be estimated by the following equation:
R = h2S
In this equation, the Response to Selection (R) is defined as the realized average difference between the parent generation and the next generation. The Selection Differential (S) is defined as the average difference between the parent generation and the selected parents.
For example, imagine that a plant breeder is involved in a selective breeding project with the aim of increasing the number of kernels per ear of corn. For the sake of argument, let us assume that the average ear of corn in the parent generation has 100 kernels. Let us also assume that the selected parents produce corn with an average of 120 kernels per ear. If h2 equals 0.5, then the next generation will produce corn with an average of 0.5(120-100) = 10 additional kernels per ear. Therefore, the total number of kernels per ear of corn will equal, on average, 110.
- Falconer, D. S. & Mackay TFC (1996). Introduction to Quantitative Genetics. Fourth edition. Addison Wesley Longman, Harlow, Essex, UK
- Gillespie, G. H. (1997). Population Genetics: A Concise Guide. Johns Hopkins University Press.
- Joseph, J. (2004). The Gene Illusion: Genetic Research in Psychiatry and Psychology Under the Microscope.New York: Algora. (2003 United Kingdom Edition by PCCS Books) (Chapter 5 contains a critique of the heritability concept)
- Joseph, J. (2006). Missing Gene: Psychiatry, Heredity, and the Fruitless Search for Genes.New York: Algora.
- Kempthorne, O (1957 ) An Introduction to Genetic Statistics. John Wiley. Reprinted, 1969 by Iowa State University Press.
- Lynch, M. & Walsh, B. 1997. Genetics and Analysis of Quantitative Traits. Sinauer Associates. ISBN 0-87893-481-2.
- Malécot, G. 1948. Les Mathématiques de l'Hérédité. Masson, Paris.
- Wahlsten, D. (1994) The intelligence of heritability. Canadian Psychology 35, 244-258.
heritability in German: Heritabilität
heritability in Spanish: Heredabilidad
heritability in French: Héritabilité
heritability in Russian: Наследуемость (генетика)
heritability in Serbian: Херитабилност
heritability in Finnish: Heritabiliteetti