Human Genome Project

Human Genome Project, international scientific effort to map all of the genes on the 23 pairs of human chromosomes and, to sequence the 3.1 billion DNA base pairs that make up the chromosomes (see nucleic acid). Begun in 1990 with the goal of enabling scientists to understand the basis of genetic diseases and to gain insight into human evolution, the project was largely completed in 2000 when 85% of the human genome was decoded, and ended in 2003 with 99% decoded; detailed analyses of all the pairs were published by 2006. In the process, scientists identified genes for cystic fibrosis, neurofibromatosis, Huntington's disease, and an inherited form of breast cancer. In addition, the project decoded the genome of the bacterium E. coli, a fruit fly, and a nematode worm (see phylum Nematoda), in order to study genetic similarities among species, and a mouse genome was also decoded.

The Human Genome Project involved laboratories in the United States, France, Great Britain, Germany, and Japan. It was financed in the United States by the National Institutes of Health and by the Department of Energy and in Great Britain by the Wellcome Trust of London. A comparable project using new DNA-sequencing machines was begun as a private industry venture in the United States in 1998, with a stated goal of completing the mapping of the genome in three years.

Early in 2001 scientists from both teams jointly announced the “completion” of the mapping of the human genome, indicating that they had identified an estimated 30,000 protein-coding genes (instead of the expected 100,000), constituting just 1% of the total human DNA. Subsequent comparison of the two teams' data has indicated that, because of differences in the genes identified by the teams, there may in fact be as many as 40,000 human genes. A subsequent, more refined estimate (2004) based on additional work on the genome was that there are between 20,000 and 25,000 genes; more recently, the that range has been reduced from around 20,000 to somewhat more than 21,000. Scientists have also identified stretches of the genome that code for RNA that is not used to produce protein; there are more than 25,000 of these RNA-producing, or noncoding, genes.

Work continues on further refining the sequencing of the genes on the chromosomes, eliminating the remaining gaps in the genome map, and identifying the extent of variation in the human genome. In 2007 the first sequences of human individuals (James D. Watson and J. Craig Venter, who led the public and private human genome sequencing efforts, respectively) were released; Venter's genome was the first full (diploid) individual human genome. The NIH's National Center for Biotechnology Information maintains GenBank, a database of publicly available genetic sequences from the genomes of plants and animals, including some extinct species.

See studies by J. Sulston and G. Ferry (2002) and J. Shreeve (2004).

The Columbia Electronic Encyclopedia, 6th ed. Copyright © 2025, Columbia University Press. All rights reserved.

See more Encyclopedia articles on: Genetics and Genetic Engineering