DNA |I | |INTRODUCTION | DNA (deoxyribonucleic acid), molecule that acts as the mechanism of biological inheritance in almost all living creatures. DNA is found in nearly all cells and contains the coded instructions that control the workings of the cell. DNA is passed from parents to offspring, and contains the coded instructions that enable the offspring to develop from a single cell into an adult body. DNA is the most important molecule in life, and an understanding of the structure and function of DNA has been the most important development in biology during the past half-century or more. II | |HISTORY | Biologists had known since the late 19th century that the structures called chromosomes carry the hereditary material from parents to offspring. Chromosomes exist in a particular part of the cell—the cell nucleus—and are large enough to be visible at certain times with a light microscope. When visible, chromosomes are long, flexible entities resembling a tiny piece of string. Chromosomes contain two molecular constituents: proteins and DNA. The molecular basis of heredity was likely to reside in one or other of these molecules.
For many years biologists suspected that the chromosomal proteins were more likely to be the hereditary molecules because proteins were known to be highly variable in their molecular sub-structure, whereas DNA seemed to have a rather uniform structure. It was difficult to see how a uniform molecule could give rise to the great variety of life. However, DNA was shown to be the molecule of inheritance by an experiment in 1944. (In fact the 1944 experiment is just the most famous landmark in a series of related experiments by several teams of biologists. Then a group of American biochemists, O. T. Avery, C. M. MacLeod, and M. J. McCarty showed that DNA causes a phenomenon called ”transformation” in bacteria. In transformation, the properties of bacterial cells are altered when the bacteria are mixed with other bacteria of a different form. Something is passed between the two kinds of bacteria, causing transformation. Avery, MacLeod, and McCarty purified the various kinds of molecule in the cells—fats, proteins, and sugar as well as nucleic acids—and showed that only DNA causes transformation.
The next important advance was the discovery by the American biochemist James Watson and British biophysicist Francis Crick in 1953 that DNA has the structure of a double helix. DNA is made up of three kinds of sub-molecule: phosphate, deoxyribose (a sugar), and various bases. DNA contains four kinds of base: adenine (A), cytosine (C), guanine (G), and thymine (T). The Austrian biochemist Erwin Chargaff had noticed that in DNA the amount of A equals the amount of T, and the amount of G equals the amount of C. This suggested that each A is bonded to a T, and each G to a C. The two rules—that in DNA A bonds with T, and G with C— are called the “base-pairing rules”. ) The other main kind of evidence used by Watson and Crick was that obtained by X-ray diffraction. X-ray diffraction is a method used to deduce the structure of molecules that are too small to observe directly. In the case of DNA, X-ray diffraction suggested that the molecule was some sort of helix. In Watson and Crick’s model, DNA consists of two strands (or “backbones”) made up of alternating phosphate and deoxyribose molecules; the bases are attached to the deoxyribose and stick out from the two strands.
G bases in one strand are bonded to C bases in the other, and A bases to T bases. The bonds between the bases are hydrogen bonds. In all, the two strands are bonded together on the inside of the molecule by the bases; each strand is twisted into a helical shape, making a double helix. Knowledge of a molecule’s structure does not always help in understanding how the molecule works, but in the case of DNA, Watson and Crick’s discovery was hugely important. The double helix pointed to ways in which DNA could be reproduced and to ways in which DNA could encode information. III | |HOW BIOLOGICAL INFORMATION IS ENCODED IN DNA | DNA consists of two long, complementary chains of nucleotides. A nucleotide contains one deoxyribose-phosphate link in the DNA chain, and one base. The deoxyribose-phosphate part is always the same, but there are four kinds of base and therefore four kinds of nucleotide. The letters A, C, G, and T can stand for either the bases or the nucleotides containing those bases. The DNA can then be imagined as a long series of letters, corresponding to the order of the nucleotides down the chain.
A stretch of DNA can then be symbolized by something like GATACCA… The DNA in a human cell consists of about 6 billion nucleotide letters arranged in a particular sequence. The full set of DNA in a cell is called the genome. However, the word “genome” can be used in two ways. The 6 billion nucleotides of DNA in a human cell are made up of two equal sets of 3 billion nucleotides, one inherited from that individual’s mother and the other from that individual’s father. The maternal and paternal DNA contain a very similar set of information, and the basic human genome consists of about 3 billion nucleotides.
This is the length of DNA studied in the human genome project. However, the full genome of a cell has two such sets of DNA, and “genome” can refer to either the unit, or the doubled-up, DNA set of a cell. Genomes differ between the two main categories of living things, prokaryotes and eukaryotes. Bacteria are the main examples of prokaryotes; almost all multicellular life forms—including mushrooms, flowers, trees, insects, and humans—are eukaryotes. Some single-celled eukaryotes, such as Paramecium and yeasts, also exist. In eukaryotes, most of the DNA is located in a special region of the cell, the nucleus.
However, eukaryotic cells also have DNA outside the nucleus, in structures called organelles. In animal cells, the only organelles to contain DNA are the mitochondria. In plant cells, organellar DNA is found in chloroplasts as well as in mitochondria. Prokaryotes, such as bacteria, lack a distinct nucleus; they also lack organelles that contain DNA. Prokaryotic DNA exists in the cell but not in any special compartment. The genome of eukaryotes, therefore, can be divided into nuclear and organellar DNA. The nuclear DNA makes up the overwhelming majority: the 3 billion nucleotides of the human genome are the nuclear component.
A mitochondrion contains only about 16,000 nucleotides of DNA. The information in the DNA exists in units called genes. To a crude approximation, one gene codes for one protein. Most bodily functions are, at a molecular level, carried out by proteins. Some proteins act to digest food, other proteins act to defend our bodies from infectious disease, and others transport materials around the body. A few thousand kinds of protein are needed in the workings of one cell, and a few tens of thousands of proteins are needed in the workings of a multicellular body such as a human body.
These thousands of kinds of proteins are coded for in the DNA by thousands of genes. A gene is a stretch of nucleotides in the DNA. Not all the DNA codes for genes. Indeed in human DNA only about 5 per cent or less of the DNA codes for genes. The rest is referred to as non-coding DNA and is of uncertain function. The DNA consists of genes with inter-genic regions of non-coding DNA in-between the genes. As previously alluded to, it is not exactly correct to say that a gene codes for a protein. Some proteins, such as haemoglobin, are assembled from more than one gene. One haemoglobin molecule is assembled from four separate genes in the DNA.
The four units that are assembled to make a haemoglobin molecule are called polypeptides. All proteins are made up of one or more polypeptides. A more accurate definition of a gene is a stretch of DNA that codes for one polypeptide. However, even this definition has exceptions. For a start, some genes code not for proteins but for RNA molecules (such as ribosomal or transfer RNA) that are never translated into protein. Also, some genes can be read in multiple ways, and code for more than one polypeptide (see the description of alternative splicing under “Introns and Exons” below).
These exceptions could be dealt with by defining a gene as a stretch of DNA that has a continuous RNA molecule transcribed from it. The mechanism by which a protein is read off from the DNA is understood in detail. A protein is a chain of amino acids. Living things mainly use 20 different amino acids, and the properties of a protein follow from its particular sequence of these 20 amino acids. In the DNA, three nucleotides code for one amino acid. There are four different kinds of nucleotide in the DNA (A, C, G, and T) making 64 possible sets of three nucleotides (AAA, AAC, ACC… and so on).
One set of three nucleotides, coding for one amino acid, is called a codon. In the late 1950s and the 1960s, molecular biologists worked out which codon coded for which amino acid. The result is a table showing which amino acid is coded for by all of the 64 codons. This table is called the genetic code. For instance, an AAA codon in the DNA codes for the amino acid phenylalanine and a GAA codon codes for leucine. The DNA sequence GAAAAA codes for a leucine followed by a phenylalanine in a protein. The genetic code contains more codons (64) than are needed to code for all 20 kinds of amino acid.
The genetic code is sometimes described as showing “degeneracy” or “redundancy” for this reason. The three-fold redundancy follows inevitably from the use of four bases to code for twenty amino acids. If a set of only two bases were used (AA, AT… and so on), there would only be 16 coding units, which is not enough. The next step up, from two to three nucleotides, takes us from 16 to 64 codons. It is impossible for four bases to be arranged into exactly 20 coding units (one per amino acid), at least in a non-overlapping code. (In theory, a sequence such as ACATAA… ould be read in an overlapping manner: the initial ACA, for instance, would code for the first amino acid; then CAT could code for the second amino acid; then ATA for the third, and so on. This would be one example of an overlapping code. It was once thought that DNA uses an overlapping code, because one kind of overlapping code was found to contain 20 distinct units. However, in fact DNA uses a non-overlapping code, with each triplet of nucleotides coding for one amino acid, and the code contains redundancy. ) |IV | |TRANSCRIPTION AND TRANSLATION |
The information in the DNA is converted to make proteins in two main stages. The first is transcription. A molecule called messenger RNA (mRNA) is synthesized directly on the DNA. The DNA double helix is first unwound, and the two strands are separated. The mRNA molecule is then formed on one of the DNA strands. mRNA is, like DNA, a chain of nucleotides, with the minor difference that it contains the base uracil (U) instead of thymine (T). The nucleotides of mRNA bond to the DNA strand, with the same pairing rules as in the DNA double helix.
If the DNA sequence is ACAGT, a messenger RNA with the sequence UGUCA will be formed from it. The enzyme RNA polymerase catalyses the process of transcription. The mRNA that is transcribed from a gene contains the same information as the DNA strand, but in a form from which a protein can be produced. (The genetic code is conventionally written in terms of mRNA code, not DNA code. It was stated above that the codon AAA in DNA codes for phenylalanine. AAA in DNA-code corresponds to UUU in mRNA-code, and in the standard table for the genetic code UUU is given as one of the codons for phenylalanine [see table]. Once the transcription of a mRNA molecule is complete, the mRNA separates from the gene. After various kinds of post-transcriptional modification (see RNA) it makes its way to another structure in the cell, called a ribosome. Ribosomes are the sites of protein assembly and are made of a second kind of RNA, called ribosomal RNA (rRNA). The mRNA binds to the ribosome. Now a third kind of RNA, transfer RNA (tRNA), brings the amino acid. A tRNA molecule has a codon at one end, which binds to the complementary codon in the mRNA. At the other end, the tRNA carries the amino acid corresponding to its codon.
The various tRNAs therefore bind to the mRNA, and their amino acids are lined up in the appropriate order to make the protein. The amino acids are joined up and detached from their tRNA. A protein has been assembled. The conversion of mRNA information, at a ribosome, into a protein is called translation. In summary, the processes of transcription and translation act to decode the DNA information in a gene; the result is a protein molecule, and it is through proteins that DNA exerts its effects in the cells of living creatures. In living cells, information flows from DNA to RNA to protein.
Cells contain no mechanism to synthesize mRNA from protein, or (with rare exceptions) DNA from mRNA. Francis Crick introduced the expression “the central dogma” of molecular biology to refer to the one-way information flow. The central dogma provides a molecular explanation for the fact that acquired characters are not inherited. Heredity is Mendelian (see Mendel), not Lamarckian (see Lamarck). Some exceptions are known, in which RNA can be reverse-transcribed to form DNA, but they concern special RNA viruses or parts of the DNA that do not code for genes.
The exceptions do not violate the central dogma in a deep sense, and the central dogma still stands as a good generalization about information flow in living things. |V | |INTRONS AND EXONS | The genes of many forms of life have an additional feature. The genes contain stretches of DNA, called exons, each of which codes for part of a protein. The exons are divided up by stretches of non-coding DNA, called introns. The genes consist of alternating introns and exons, and the full codes for a polypeptide are contained in several scattered exons, rather than a continuous stretch of DNA.
The number of exons varies from gene to gene, but most genes in mice and humans have 12 exons or less (though some of our genes have over 60 exons). The division of genes into exons and introns complicates the story of transcription. The whole gene, consisting of all the exons and introns is initially transcribed to form mRNA. Then, while the mRNA is still in the cell nucleus, the introns in the mRNA are removed in a process called splicing. The introns are said to be “spliced out”. The result is a final mRNA molecule consisting only of the exons, in the correct order to code for the protein.
This final mRNA moves to the ribosomes and is translated. The reason why many genes contain introns, which are then removed, is uncertain, though there are several hypotheses. The presence of introns differs between prokaryotes and eukaryotes. Bacterial genes mainly lack introns. Each bacterial gene simply codes for a polypeptide. Bacterial genes are said to be colinear with their proteins. But in eukaryotes, at least some genes have been found to contain introns whenever they have been looked for. However, introns are rare in the single-celled yeast.
Almost all (96 per cent) of yeast genes consist of a single exon, lacking any introns. In fruit flies, only 17 per cent of genes lack introns. In humans, only 6 per cent of genes lack introns. Biologists are uncertain whether early life forms lacked introns, as bacteria mainly do today, and introns evolved relatively late when complex multicellular life arose. Alternatively, introns may have been present early on, in the common ancestors of modern bacteria and eukaryotes and then been lost in the evolution of bacteria.
These are called the “introns-late” and “introns-early” theories. In life forms whose genes do contain introns, the length of introns ranges widely but the lengths of exons are more uniform. In mice and humans, for example, most exons are 100-300 nucleotides long (coding for 30-100 amino acids); but most introns are in the 100-25,000 nucleotide range. The longest introns are almost 100,000 nucleotides long. All this intron material is spliced out of the mRNA transcription, and the initial mRNA of an average gene is about five times the length of the final mRNA that is translated.
The arrangement of genes into introns and exons enables, in some cases, one gene to be read in more than one way. The process is called alternative splicing. For example, a gene might contain five exons, divided by four introns. One mRNA molecule might be created from exons 1 to 5; a second mRNA molecule from exons 1 to 4 (with exon 5 being spliced out, as well as the four introns); and another mRNA from exons 1 to 2 and 4 to 5 (with exon 3 being discarded along with the introns). Then three different mRNA, and so three different proteins, will be read from a single gene.
Several examples of alternative splicing are known. For instance, a gene called slo codes for a protein that contributes to the acoustic sensitivity of little hairs in our inner ears. Our ears are sensitive to a range of pitch because we have many hairs each tuned to be sensitive to a particular sound frequency. It might be thought that distinct genes would code for the proteins that give the different acoustic sensitivities to the different hairs. But slo contains at least 8 sites of alternative splicing, allowing at least 500 different, but related, mRNAs to be read from it.
Alternative splicing is a recent discovery, and biologists do not yet know how important it is in life. It does, however, along with the combinatorial action of genes, show how enormous complexity and variety can arise from relatively few genes. |VI | |NON-CODING DNA | The information to code for proteins is contained in the genes. The part of the DNA that consists of genes is called coding DNA. (Though even genes contain non-coding DNA, in the form of introns. ) The genes, at least in eukaryotes and particularly in multicellular eukaryotes, are located within stretches of non-coding DNA.
The non-genic, non-coding DNA may have no function, and it is often called “junk” DNA for this reason. Alternatively, the non-coding DNA (or part of it) may indeed have some function, for instance in regulating gene expression, or ensuring that the genes are appropriately positioned within the DNA, or contributing to architectural features in the large-scale structure of the DNA. The function (if any) of non-coding DNA remains uncertain. Much of the non-coding DNA is repetitive, consisting of repeats of a certain unit sequence. (The repeats are often of closely related, rather than identical, versions of the unit sequence. In some cases, the unit sequence is relatively long, approximately as long as one gene. These are called LINEs (long interspersed elements). For example, the human genome contains about 500,000 copies of the LINE1 sequence, each of which is about 1000 nucleotides long. Somewhat shorter are the SINEs (short interspersed elements). The commonest SINE in human DNA is a 260-nucleotide sequence called Alu; the human genome contains over a million copies of Alu. LINEs and SINEs are examples of transposable elements. Transposable elements, or transposons, are known informally as “jumping genes”.
They are able to copy themselves into other sites in the DNA. The metaphor of jumping is imperfect because when we jump we move from one place to another. When a transposable element copies itself elsewhere in the DNA, it creates a second copy in addition to the original copy. Transposable elements therefore tend to proliferate through the DNA, and this helps explain the huge number of copies of certain repeat sequences in human DNA. Finally, some repetitive DNA has short unit sequences, of approximately ten nucleotides. At various sites around the DNA the unit sequences are found with various numbers of repeats.
The repeats are arranged side-by-side; side-by-side repeats of any sequence of DNA are called tandem repeats. One well studied such sequence in human DNA has a unit 10-15 nucleotides long. Human DNA has about 1000 copies of the sequence in all, scattered among several sites. At one site, there might be 10 repeats; at another site, there might be 100 repeats. The short sequences that consist of variable numbers of repeats are called “variable number of tandem repeats” (VNTR) sequences. They are also called mini-satellites. VNTRs are not known to have any function in DNA, but they have been put to use by human beings.
The exact number of tandem repeats at any one site changes rapidly between the generations. The exact pattern of repeats of the short unit sequence is peculiar to one individual, and his or her close relatives. This DNA provides the basis of genetic fingerprinting (see DNA Fingerprinting), in which DNA is used as forensic evidence. Genetic fingerprints have been used to establish guilt, or to establish innocence (including several people on death row), in court, and to establish paternity, or non-paternity, in lesser legal disputes. |VII | |DNA REPLICATION |
The DNA is copied, or replicated, once per cell division. It is copied at every cell division within the lifetime of one organism, and in the reproductive cell line that produces sperm and eggs. A large set of perhaps 50 or so enzymes catalyse the DNA replication. DNA polymerases are the most important of these enzymes. The first step is to unwind the double helix at a certain site (called an origin of replication) and separate the two strands. Two new strands are then formed, one on each of the strands of the parental double helix.
The DNA at the site of replication resembles a fork, where the two strands of the parental DNA are split apart to be copied. It is called a replication fork. DNA replication is semi-conservative: that is, each new double helix contains one strand from the parental copy and a second new strand that was copied from it. Theoretically, it could have been that DNA replication was conservative rather than semi-conservative; this would have meant that after replication one of the DNA molecules would have both strands conserved from the parental DNA, and the other DNA molecule would contain two new strands.
The semi-conservative nature of DNA replication was shown in a classic experiment called the Meselsohn-Stahl experiment. Two American geneticists Matthew Meselsohn and Franklin W. Stahl labelled DNA with a heavy isotope of nitrogen 15N instead of the normal 14N. The labelled DNA could be distinguished from normal DNA. They then allowed bacteria with labelled DNA to reproduce in an environment containing normal 14N. They found that the offspring DNA all contained heavy 15N, but about half as much as in the original labelled parents. Thus half the parental DNA is preserved in each offspring DNA molecule: DNA replication is semi-conservative.
The new strand of DNA is made by placing the complementary nucleotides opposite each nucleotide in the parental strand. If the parental strand reads … CTA … , for example, then a G followed by an A followed by a T will be bonded next to it. Occasional mistakes occur, and an inappropriate nucleotide is inserted. A ‘T’ might be put next to the C, instead of a G. These so-called mismatches are detected by enzymes, called proof-reading enzymes, within the complex of enzymes that travel down the replication form. The mismatched region of the new strand is then removed, and the DNA is re-copied.
DNA replication is complicated by the fact that the two strands of the double helix are mirror-images of each other. One strand can be thought of as going left to right across the page; but the other strand goes right to left. At a molecular level, this is due to the structure of the deoxyribose in the DNA backbone. Deoxyribose contains five carbon atoms, which are conventionally numbered 1 to 5. The phosphate upstream is attached to carbon atom number 3, the phosphate downstream to carbon atom number 5. The 3’ bond in one strand of the double helix is opposite the 5’ bond in the other strand.
DNA polymerase creates new strands only in the 5 —; 3 direction. It is as if it can only copy from left to right, and not from right to left. Copying one of the strands is easy, but how is the other strand copied? The answer is that it is copied in short backwards stretches. The short stretches are called Okazaki fragments, and they are about 1000-2000 nucleotides long. The Okazaki fragments are joined together to make a continuous DNA strand. |VIII | |MUTATIONS | During DNA replication, the proof-reading enzymes fail to detect a small fraction of miscopied nucleotides.
The new, miscopied nucleotide can then be permanently incorporated into the DNA. These changes in the DNA sequence are called mutations. The rate of mutation is low—about 10-10 per nucleotide every time human DNA is copied. But a non-trivial number of mutations occur every human generation. The human genome is about 6. 6 x 109 nucleotides long and it is copied about 200 times per generation (between the conception and reproductive maturity of an individual). In all, about 175-200 nucleotides are miscopied in every new human offspring, relative to its parents’ DNA.
Moreover, simply miscopied nucleotides are not the only kind of mutation. Sometimes a short stretch of DNA may be copied twice over, or missed out. The reason why genetic fingerprinting is possible is that certain short sequences of repetitive DNA (the VNTR DNA) have exceptionally high mutation rates. The mutations consist of changes in the number of tandem repeats of the unit sequence: a sequence that is repeated 50 times may be copied only 49 times, or 51 times, in the offspring. The chance of a change in the number of repeats is about 1 in 100 between a human parent and offspring for every site where a VNTR sequence is located.
Some mutations do not arise as errors in DNA replication. Certain environmental mutagens, such as X-rays and UV-radiation, cause mutations. The radiation can break DNA strands, or cause chemical changes in neighbouring nucleotides such that they form sideways bonds (dimers). Through much of the 20th century, it was supposed that most mutations had external causes in environmental mutagens, but research had established by late in the century that the majority of mutations are internal copying mistakes rather than being caused by external insults.
Mutations are often harmful for the individual that inherits them. However, they also give rise to genetic variation in the population. This genetic variation is the raw material for evolution by natural selection. Mutation is, for the individual, not disadvantageous on average and mutations probably only occur at the tiny rate they do because the metabolic cost of driving the mutation rate down to zero would be prohibitively high.
The fact that evolution occurs at all can then be seen as a by-product of the thermodynamic difficulty of further reductions in mutation rate. If the mutation rate really were driven down to zero, evolution would come to a stop. |IX | |EVOLUTION OF DNA | Almost all life on Earth uses DNA as its hereditary material. The only exceptions are certain RNA viruses, which include the influenza virus and HIV (the agent of AIDS). This suggests that DNA started to be used early in the evolution of life.
Also, the genetic code is essentially (though not exactly) identical in all life. The genetic code appears to be arbitrary, in the same way that human language is arbitrary: there is nothing about a book that requires it to be called “a book”, and it is called “un livre” in French. Likewise, the triplet UUU does not have to be used to code for phenylalanine. The universality of the genetic code suggests that all modern life is descended from one common ancestor that also used the same genetic code.
The code is evolutionarily hard to change once it has evolved, and Francis Crick has described the exact pattern of the genetic code as a “frozen accident”. Although DNA is ancient in evolution, DNA was probably not the earliest hereditary molecule, near the origin of life. DNA has many ‘advanced’ features that would prevent it from working in very simple life forms. It is a double helix, with the coding information stored inside. In order for the information to be read or replicated, the double helix has to be unwound.
Special enzymes are needed to unwind the DNA, and these enzymes would not have existed early on. Moreover, DNA by itself is biologically inert; DNA cannot directly catalyse any metabolic processes. By contrast, RNA codes information in a single-stranded form that can interact directly with the environment. Some RNA molecules can also catalyse biological processes; these RNA molecules are called ribozymes. For these and other reasons, biologists suspect that DNA-based life was preceded by an “RNA world” in which simpler life forms existed and used RNA as their hereditary molecule.
Still simpler life forms based on some other replicating molecule may have preceded the RNA world, but we have no evidence about the hypothetical pre-RNA stage. Life forms that use RNA have small hereditary molecules, coding for only limited information. RNA viruses typically contain fewer than 10 genes, and are less than 10,000 nucleotides long. RNA is a more mutable, less stable molecule than DNA. DNA-based life forms may have evolved because of the lower mutation rates they have. Bacteria had evolved by 3,500 million years ago, and all modern bacteria use DNA.
Bacterial genomes are about 1-10 million nucleotides long and contain 1,000-5,000 genes. Multicellular life contains still larger genomes. The DNA of a fruit fly has about 300 million nucleotides and 14,000 genes. The DNA of a human has about 6 billion nucleotides and 30,000 genes. Thus, as more complex life forms have evolved, it has been at least in part by expansion in the codes and coding capacity of the DNA. Biologists are currently at the point of learning much more about how the DNA of various life forms codes for their various kinds of bodies.
DNA can now be sequenced rapidly, and the whole genomes of several species have now been transcribed. The sequence of the human genome was completed in 2003. The DNA sequence of a living creature can be used to count how many genes are needed to build that life form, and see how the genes have evolved from other genes in related species of life. The estimate that humans contain 30,000 genes is somewhat lower than the estimates made in the pre-genomic era. Before 2001, biologists thought that humans contain about 60,000 – 100,000 genes.
The full reason for the discrepancy is unknown, but part of the explanation probably lies in mechanisms that allow one gene to be read in many ways (RNA editing, alternative splicing) and in the variety of ways in which genes work together. Humans probably read off more information per gene on average than bacteria do. Biology has now moved into the genomic era, and biologists are using the massive amounts of new DNA sequence data to investigate the evolution and workings of life. Contributed By: Mark Ridley Microsoft ® Encarta ® 2006. © 1993-2005 Microsoft Corporation. All rights reserved.