Degenerate PCR
Degenerate PCR is in most respects identical to ordinary PCR, but with one major difference. Instead of using specific PCR primers with a given sequence, you use mixed PCR primers. That is, if you do not know exactly the sequence of the gene you are going to amplify, you insert "wobbles" in the PCR primers where there is more than one possibility. For instance, if you just have a protein motif, you can back-translate the protein motif to the corresponding nucleotide motif. (Protein --> Sequence). Example of a degenerate PCR primer designed after a protein motif: Trp Asp Thr Ala Gly Gln Glu 5'' TGG GAY ACN GCN GGN CAR GA 3'' (This gives a mix of 256 different oligonucleotides.) where the Y = C or T, R = G or A, N = G, A, T or C. The more wobbles you introduce in the PCR primer the more degenerate it gets. (The degeneracy of the primer is produced during DNA synthesis, you do not need to order 256 different primers to get a 256 mix, that''s a lot of paper work, and also expensive!)
Why use degenerate PCR? Degenerate PCR has proven to be a very powerful tool to find "new" genes or gene families. Most genes come in families which share structural similarities. By aligning the sequences from a number of related proteins you can determine which parts are conserved and which are variable. Based on this information you can use conserved protein motifs for starting points for designing degenerate PCR primers. Degenerate PCR applies to a number of scientific settings:
You have isolated a protein and managed to sequence some amino acids from it. You want to find the corresponding gene! Why not try with degenerate PCR? You have found a human gene and want to clone the homolog gene from e.g. mouse or Drosophila . Of course, you can try with low stringency hybridizations, but how many false positives do you have to sequence before you find the correct one? You have found an interesting gene in yeast or C. elegans and want to find the human homolog (if it exists). Why not try degenerate PCR? Phylogenetic and evolutionary studies of genes: e.g. you can find specific orthologous genes from a number of related species and compare them. This type of information can reveal potential active sites, regulatory regions and much more. Studies of gene families. E.g. "How many members of the Rab family exists in green algae?", "Do they differ when compared with the higher plants?".
These are just a few examples of possible applications of degenerate PCR.
Technical comments Requirements : What kind of sequence information do you need to get started?
- Two blocks of conserved amino acids/DNA sequence. The length of the primers should be a minimum of 20 bp.
The protein motif does not have to be 100% conserved. Sometimes a partially conserved protein motif is sufficient. Examples of common found substitutions are Glu --> Asp and Arg --> Lys. If you use the degenerate codon GAN, it covers both Glu and Asp. Similarily, if you use the MGN codon (M = C or A), where you know there should be a basic amino acid (Arg or Lys), the MGN codon covers partially the Lys codon AAR. However, if there is a Lys residue you will have a G/T mismatch in the second base. This is normally no problem as long as this mismatch occurs in the middle or the 5'' part of the primer. (Remember your biochemistry, the enol form of thymidine can pair with guanine). - The N-terminal part of a protein (obtained from protein sequencing) often gives enough sequence information to be used for degenerate PCR.
If the the N-terminal sequence is 20-30 amino acids, it is often possible to make two degenerate primers, and you can amplify a 50-90 bp cDNA fragment which you can use as probe to screen a cDNA library. Alternatively, you can make two degenerate primers and try a 3'' RACE to amplify the rest of the cDNA. Hint: The easiest way is normally to amplify a fragment of the N-terminal, sequence this fragment and then make specific primers for 3'' RACE.
cDNA or genomic DNA?
- In general, cDNA works best as a template because of its'' lower complexity (in eukaryotes a small percentage of the genomic DNA encodes proteins).
- Also, the size of the PCR fragment is "predictable", because there are no introns.
- If you are uncertain as to whether your gene is expressed in the tissue/developmental stage you have chosen, genomic DNA can be used as a starting material. The drawback is that you often have to sift through a lot of junk DNA before you find your gene.
How degenerate can PCR primers be and still function?
- 1000 - 10.000 fold degeneracy is not uncommon.
- The degeneracy of the primers can be kept down by substituting four-base wobbles with inosines, i.e. CGI instead of GGN.
- Example motif: CVGG(M/L)NRRP (found in p53 proteins).
- Without inosines: 131072 mix: 5'' TGY GTN GGN GGN MTN AAY MGN MGN CC 3''
- With inosines: 512 mix: 5'' TGY GTI GGI GGI MTI AAY MGN MGN CC 3''
Choosing PCR conditions
- Try standard conditions with slightly lower annealing temperature, 35-50 cycles.
- If standard conditions fail, run the first four cycles at 5-10° C lower than recommended, i.e. 42-46° C. (PCR primers with multiple mismatches will be extended, and hopefully some stick to your gene).
- If the primers are very degenerate (512 mixes or more), competitive inhibition can lead to problems. (Primers bind the correct template but are not extended by the polymerase because of unstable 3'' ends.) This means that the first PCR cycles are very inefficient, and you sometimes have to run 50 cycles to see a even faint band of your gene.
- Remember not to use a DNA polymerase with 3'' --> 5'' exonuclease activity, (PCR primers will be degraded). Taq polymerase works OK.
What types of genes are "easy" to find by degenerate PCR? Many proteins have structural similarities with other proteins and often share a common evolutionary origin. Proteins with ancient conserved motifs (ACM''s) are in general "easy" to find. More than 500 families of proteins with ACM''s are known! (Some of these families are huge: Ser- Thr- Tyr- kinases in humans number around 1000 genes.) By 2002 the complete sequence of 8 eukaryotic genomes are known. (Human, Drosophila melanogaster (fruit fly), Anopheles gambiae (the mosquito), C. elegans (nematode), S. cerevisiae (baker''s yeast), Schizosaccharomyces pombe (fission yeast), Arabidopsis thaliana (plant), Plasmodium falciparum (protist) and pretty soon Giardia lamblia . In addition, tens of bacterial genomes are completed. These genomes provide a wealth of information regarding the evolution of various gene families and can be used as a starting point to find genes in even more obscure organisms. Start by making a protein alignment of your protein of interest. Include as many proteins as you can find. If the protein is not well conserved, try to find regions that have some conserved amino acids, and if you know the sequence from a closely related organism, use this as a guiding sequence. Sometimes you can gamble on the sequence with great luck.
Implications By using degenerate PCR you can find most genes from yeast and animals irrespective of organism (cow, frog, snail, beetle, worm or fungi). Problems may arise if you try to catch the fast evolving genes. If not, you are pretty sure to find what you are looking for by using degenerate PCR. The case may be a bit harder if you look for genes in protists, such as the cryptomonads, where many genes have undergone massive genetic drift and have changed a lot compared to other eukaryots. Apart from that, limitations are in general relatively few. Other limitations
- The conserved amino acids you try to design the PCR primers after are composed mainly of Ser, Arg and Leu. These are the amino acids that give most wobbles, and this might result in primers so degenerate they amplify virtually anything, normally just a lot of junk.
- There is a limitation in size of the region you try to amplify. As a rule of thumb, avoid PCR products larger than 1000 bp.
- If the DNA of the organism you try to amplify the fragment from has a very high GC content you might end up amplifying a lot of incorrect fragments. This is also the case if the DNA of the organism has a very low GC content, and you have designed your primers too short (a high TA content gives a low melting point for your primers).
- If the gene you are looking for does not exist in the organism you have chosen, you are out of luck. There are a few examples of "dinosaur genes" genes which have disappeared in certain lineages during evolution (for instance Rac genes in S. cerevisiae ). The menu on the right contains links to short presentations of some methods and protocols used at NTNU CMB.