Get Out The Bases
Well, I’m back in from the lab, with the final post in my artificial nucleotide series. Apologies to my nonexistent readership for taking so long.
First, the adenine issue. You’d think that adding the 2‐amino group would make the A–U interaction significantly stronger, or comparable to guanine. Strangely enough, this isn’t so. According to MacDónaill’s calculations (MacDónaill & Brocklebank, Mol. Phys. 101 (17) 2755–2762), the added amino group makes this interaction about 21% stronger. But the G–C interaction is 125% stronger than the A–U interaction, at 25.94 vs. 11.54 kcal/mol! It turns out that most of this effect is due to the fact that all the hydrogen bonds in the G–C base pair are oriented in the same direction, so that the bond dipoles strengthen one another. Looking further over the list of interaction energies, the strongest of the base pairs with one mismatch (iC‐δ) has a strength slightly below that of the A–U base pair. This is because a base pair with one mismatch will experience additional weakening due to the repulsion of like charges (two hydrogen atoms or two lone pairs of electrons). But we’ve already eliminated base pairs with one mismatch, because of the parity code, so the lack of the 2‐amino group on adenine is perfectly acceptable. There’s some reason to believe that adenine could arise more simply than 2‐aminoadenine in a primitive chemical environment, and there may just never have been sufficient evolutionary presure to compel the development and incorporation of 2‐aminoadenine into nucleic acids.
Having discussed into the ground the principles behind base pairing as it exists in nature, what are people doing to improve on it in the lab? One of the leaders in the field is Peter Schultz. A brilliant if somewhat eccentric researcher, he’s best known for the development of techniques for incorporating artificial amino acids into proteins in living systems. He's concentrated on nucleotides that pair with themselves, pairing either by hydrophobic interactions or by sharing a chelated metal ion. He’s had some success with this, but the big problem he’s had is that DNA polymerases stall after incorporating these base pairs. The reason is pretty obvious, in retrospect; it’s due to the error‐correcting properties of the polymerase. There’s a great deal of selective pressure on polymerases to detect errors in replication and either stop replicating and allow an exonuclease to remove the erroneous base, or remove it themselves. Differences between the positioning of these unnatural and natural base pairs in the minor groove cause the unnatural base pairs to lack key interactions that would signal proper pairing to the polymerase (Matsuda et al., J. Am. Chem. Soc. 125 6134–6139 (2003)). Schultz is working on evolving polymerases that can accept these bases, but that’s not a trivial task, and it’s not clear what effect that will have on overall replication fidelity.
So why are people like Schultz working on this problem? You can probably guess from the description of his primary research interest, artificial amino acids. The ultimate goal of this sort of work would be to extend the genetic code from end to end, with artificial nucleotides encoding artificial amino acids. This would be a real boon to researchers, in a lot of ways. Having a wide variety of amino acids lets you make mutations that can really probe protein‐protein and protein‐DNA interactions at the atomic level. What might be even more interesting would be to grow bacteria or other organisms with an expanded genetic code over an extended time period, and let mutations occur. Would more efficient proteins evolve when provided with a greater complement of amino acids?
It’s an exciting vision, but there are many problems to be solved along the way. Not only do you need new base pairs compatible with DNA and RNA polymerases, they also need to be compatible with other nucleotide‐related enzymes like topoisomerases, helicases, ligases, and so on. But probably the most difficult task is developing a tRNA synthetase for each new amino acid. This is the cornerstone of Schultz’s technique; develop, by directed evolution, a mutant synthetase that doesn’t accept any other amino acids and charges an amber tRNA. This is not a trivial task, and those of his papers which I’ve read generally seem to use an archaeal tRNA synthetase that starts out incompatible with the bacterial tRNA system. Trying to develop a whole raft of tRNA synthetases that are highly selective for their artificial amino acid in the presence of many similar artificial amino acids could be a very tall order. But if it’s a hard problem, it’s also a very rewarding one, and I expect plenty of interest in it in the years to come.
A Thin Alphabet Soup
The question of why biological nucleic acids have only two base pairs has been the subject of speculation for some time. In the purine–pyrimidine base‐pairing system, there are three sites for hydrogen bonds, each of which may have two possible orientations, depending on which site is the hydrogen donor and which the hydrogen acceptor. This gives a total of eight (23) arrangements of hydrogen donors and acceptors; since the complement of each arrangement must appear on the other side of the base pair, this makes four hydrogen‐bonding arrangements. But the three hydrogen‐bonding groups may be mounted on either a pyrimidine or a purine, so the total number of possible base pairs is eight.
In practice, to get the two base pairs where each nucleotide has only hydrogen donors or only hydrogen acceptors, you have to cheat and use compounds that are neither pyrimidines nor purines; because nitrogen has a valence of three, you always wind up with a proton sticking out somewhere. For instance, changing the 3‐position nitrogen of uracil to oxygen would result in hydrogen acceptors at all three hydrogen‐bonding positions on the molecule. Unfortunately, an oxygen atom flanked by two keto groups forms a compound known as an (acid) anhydride, because it’s equivalent to the fusion of two carboxylic acids with loss of water. When you dissolve such a compound in water, the equilibrium almost invariably winds up massively favoring the two acids. The ring of this hypothetical compound (designated “Σ”) would be broken by hydrolysis in an aqueous environment and be useless for base‐pairing. This leaves six possible base pairs.
Benner and his colleagues (Switzer et al., J. Am. Chem. Soc. 111 8322–8323) showed that one of these base pairs, isoguanine (iG) and isocytosine (iC) could be incorporated into newly synthesized DNA or RNA by natural polymerases. Benner and colleagues went on (Piccirilli et al., Nature 343 33–37 (1990)) to show that the base pair between xanthine (X) and 2,6‐diaminopyrimidine (κ) could also be processed by natural RNA and DNA polymerase, albeit with a rather high (14%) frequency of X–A mispairing in the case of RNA polymerase. So why doesn’t nature take advantage of these extra base pairs to provide a longer genetic code and more amino acids?
An attractive explanation for this problem was put forth by the evolutionary biologist Eörs Szathmáry (Szathmáry, E. Proc. Biol. Sci. 245 (1313) 91–99 (1999)). He proposed that in an “RNA world” where RNA both encoded genetic information and acted as a catalyst (a well‐accepted model for the origins of life), the catalytic power of the RNA molecules and their replication fidelity would drive the selection of a genetic alphabet. The greater the number of nucleotides incorporated into RNA, the greater its ability to perform chemical reactions; however, the number of metabolic pathways required to produce nucleotides would also increase, and the fidelity of replication would decrease because of the increased possibility of mismatches. Szathmáry’s calculations, with some back‐of‐the‐envelope estimates for the energies of base‐pairing between nucleotides, suggested that a four‐letter genetic alphabet is, in fact, optimal for an RNA world.
More recently, Dónal MacDónaill observed that the genetic alphabet may be a parity code (MacDónaill, D. Chem. Comm. 2002 2062–2063). Parity is defined as the number of ones in a sequence of bits. If we consider the orientation of each hydrogen bond as a bit either 1 or 0, and the identity of the base as purine or pyrimidine also as a bit, each nucleotide may be represented by a four‐bit sequence. Consider a matched base pair of a purine and a pyrimidine, arbitrarily choosing 101,0 and 010,1. The sum of both nucleotides is even and both have even parity. By switching the assignments of 0 and 1 to purines and pyrimidines or to the orientations of hydrogen bonds, we might make them have odd parity. However, they will always have the same parity, as will any other matched base pair.
Now consider the case of a mispaired pyrimidine and a purine which are mismatched at only one of the three hydrogen bonding sites: 101,0 and 110,1. The first has even parity and the second odd parity, so the base pair has a mixed parity. This is true for mismatches of one hydrogen bond in general. Two mismatches imply bases of the same parity, and three, a complete mismatch, indicates mixed parity again.
MacDónaill observed that not only may the hydrogen‐bonding of nucleotides be thought of as a parity code, but all of the natural nucleotides have the same parity. This means that in any purine‐pyrimidine mispairing within the natural alphabet, only one hydrogen bond can form; there will be two mismatched sites. Mispairings with only one mismatched site, which may be close enough in energy to the matched pair to risk their incorporation into DNA, do not exist in this system. This parity limitation helps keep the fidelity of replication high.
So from eight base pairs, two are eliminated on the grounds of fundamental chemical instability, and three more to keep a parity code for replication fidelity. This leaves three: A–T/U, C–G, and iC–iG. So why can’t we add the iC–iG base pair to our living, experimental systems? It’s been shown to be replicable, and the parity is correct. The answer is another chemical phenomenon, tautomerism.
Here are some useful illustrations of tautomerism in the nucleotides. Adjacent keto and amino groups may, through proton transfer, become hydroxyl and imino groups, which reverses the polarity for hydrogen bonding of both. (Keto groups and imino groups are hydrogen acceptors, amino and hydroxyl groups are hydrogen donors.) For the natural set of nucleotides, tautomerism isn’t a problem; the equilibrium between tautomers heavily favors the keto over the enol forms. (As an amusing historical note, Watson and Crick, misled by errant textbooks, tried modeling DNA with the wrong tautomers before being corrected by the crystallography Jerry Donohue.) Unfortunately, iC and iG are not so tractable. Researchers have shown (Roberts et al. J. Am. Chem. Soc. 119 (20) 4640–4649 (1997)) that the contributions of alternative tautomers are significant for this base pair, and can lead to relatively stable iG–U and iG–C mismatches. Beating nature isn’t so easy, after all.
I’ve drawn this post out pretty far as it is, so I’ll leave the discussion of laboratory attempts to make artificial base pairs—and some potential applications—for a later post. I’ll also cover the issue of why substituting adenine for 2‐aminoadenine and eliminating a hydrogen bond doesn’t make as big a difference as might be expected.
Did You Say “Nuclear Tides”?
Derek Lowe points to a nice review speculating on exobiochemistry. One of the authors is Steven Benner, who’s a well‐established researcher in the field of “alternative” biochemistry. He and his associates have published many papers on the effects of modifying different aspects of nucleic acid structure, among other things. This is how I became acquainted with his work, as I’ve been interested in artificial nucleotides for a while. Even if you’re not completely familiar with what a nucleotide is, it’s a fun review to read; you don’t find many scientific publications that cite “Star Trek” episodes and Robert A. Heinlein. Humor aside, it’s a nice sketch of the current state of speculation on extraterrestrial biochemistry.
Since this brought Benner to mind again, I thought I’d create some inaugural content for this blog by discussing artificial nucleotides, and why anyone would care about them. The rest of this post is background on nucleic acids for the non-technical audience; those of you who already know your purines from your pyrimidines will probably want to skip to the next post.
I assume that even general audience members know that DNA and RNA are polymers consisting of nucleotides attached to a sugar‐phosphate backbone, and that two strands of complementary nucleic acid can form hydrogen bonds and entwine themselves in a double helix. Since I’m setting up the background for a discussion of artificial nucleotides, we’ll be taking a close look at the hydrogen bonding interactions. But first, we’ll start with a little nomenclature.
A nucleotide is the heterocyclic compound that’s on the inside of the helix forming the hydrogen bonding interactions. Once the nucleotide forms a bond with a sugar molecule (deoxyribose in DNA, ribose in RNA), it becomes a nucleoside. (Think “S” for sugar.) Attaching a single phosphate to the nucleoside produces a nucleoside monophosphate, alias a mononucleic acid, and a string of those is a (poly)nucleic acid. (I’ll use “nucleic acid” only to refer to the polymers, which is more or less normal practice.) The four nucleotides found in DNA are adenine, pairing with thymine (alias 5‐methyluracil), and guanine, pairing with cytosine. In RNA, thymine is replaced by uracil, which is thymine less a methyl group and forms a base pair with adenine in the same fashion.
This is an image of the four DNA nucleotides as base pairs: guanine, cytosine, adenine, and thymine. If you’ve never taken organic chemistry, this may look rather confusing. Carbon atoms are implicit at each vertex of the drawing, and some have implicit hydrogen atoms attached; each carbon atom must have four bonds, so if only three bonds converge at a vertex, one hydrogen atom is bonded to that carbon as well. (Double bonds, of course, count double.) Hydrogen bonds are shown as dotted lines. The lines sticking off next to each of the letters “G”, “C”, “A”, and “T” are the bonds to the sugar‐phosphate backbone, which is not shown, and the little numbers designate each atom in the nucleotide rings for purposes of nomenclature. If a hydroxyl (OH) group were to be attached, say, at the 8‐numbered carbon of guanine, we could call the resulting compound 8‐hydroxyguanine. You may also notice a certain similarity between G and A and between C and T. G and A are both based on a fused five‐ and six‐membered ring system, with nitrogens at the 1, 3, 7, and 9 positions; this is called a purine. C and T are based on a simpler six‐membered ring system with nitrogens at the 1 and 3 position, called a pyrimidine.
“Hydrogen bonds” are shown as dotted lines because they’re not really in the same class as ordinary chemical bonds (including those shown here connecting hydrogen, with a solid line, to nitrogen or oxygen; don’t be confused by the name). They’re a sort of electrostatic attraction between certain electronegative atoms such as oxygen, nitrogen, and fluorine and the nucleus (a proton) of a hydrogen atom. While they’re considerably weaker than ordinary chemical bonds, they’re rather more strong, on an individual bases, than other intermolecular forces. So to form one, you need one of those heteroatoms (organic chemist’s jargon for “not carbon or hydrogen”) on one side, and a hydrogen atom on the other; further, the hydrogen atom needs to be bonded to a heteroatom, which will hold the hydrogen’s electron more tightly and make the hydrogen atom more positive.
It’s the patterns of hydrogen bonding that give nucleic acids their important
information‐storage capabilities. If you try to jam an A or G in across from a G, it can’t be done without considerable energetic cost and distorting the helix. T is sized like C, but it can only form one of the three hydrogen bonds. So if a new strand is being synthesized, a thymidine mononucleotide will “rattle” around in that position and will probably be ejected before being attached to the growing backbone. Similar considerations apply for the other possible mispairings. Note also that the A–T base pair only has two hydrogen bonds, rather than the three in G–C. If you stuck an amino group (NH2) group on the 2‐position of A (making 2‐aminoadenine), you could form a hydrogen bond with the oxo group at the 2‐position of T, but that doesn’t happen. Why? More on that in the next post.
Initial Post
Stand by for content shortly. Thanks for your patience.