Degenerate PCR, a short guide.
What is degenerate PCR?
- Degenerate PCR is in most respect identical to ordinary PCR, but with one major difference. Instead of using specific PCR primers with a given sequence, you use mixed PCR primers. That is, if you do not know exactly the sequence of the gene you are going to amplify, you insert "wobbles" in the PCR primers where there is more than one possibility. For instance if you just have a protein motif, you can back-translate the protein motif to the corresponding nucleotide motif. (Protein --> DNA Sequence).
- Example of a degenerate PCR primer designed after a protein motif.
Where the Y =C or T, R =G or A, N =G, A, T or C.
The more wobbles you introduce in the PCR primer the more degenerate it gets. (The degeneracy of the primer is produced during DNA synthesis, you do not need to order 256 different primers to get a 256 mix, that's a lot of paper work!! and expensive). Degenerate nucleotide codes: R=AG, Y=CT, M=AC, K=GT, W=AT, S=CG, B=CGT, D=AGT, H=ACT, V=ACG, N=ACGT.
- Trp Asp Thr Ala Gly Gln Glu 5' TGG GAY ACN GCN GGN CAR GA 3' This gives a mix of 256 different oligonucleotides.
Why use degenerate PCR? Degenerate PCR has proven to be a very powerful tool to find "new" genes or gene families. Most genes comes in families which share structural similarities. By aligning the protein sequences from a number of related proteins you can find which parts of the protein is conserved or which is variable. Based on this information you can find conserved protein motifs which can be used as a starting point to make degenerate PCR primers.
- Degenerate PCR can be used to "solve" a number of problems.
You have isolated a protein and have managed to sequence some amino acids from it. You want to find the corresponding gene!! (Why not try with degenerate PCR!). You have found a human gene and want to clone the homologue gene from i.e. Mouse or Drosophila. (Of course can you try with low stringency hybridizations, but how many false positives do you have to sequence before you find the correct one?). You have found an interesting gene in yeast / C. elegans and want to find the human homologue (if it exist). (Why not try degenerate PCR!). Phylogenetic and evolutionary studies of genes. I.e. you can find specific orthologous genes from a number of related species and compare them. (This type of information can reveal potential active sites, regulatory regions and much more). In studies of gene families. (I.e. how many members of the Rab family exists in green algae, do they differ when compared with the higher plants?).
- This is just a few examples of what kind of problems you can apply this technique for.
Technical comments.
Requirements. (What kind of sequence information do you need to get started).
- Two blocks of conserved amino acids / DNA seq. The length of the primers should be a minimum of 20 bp.
The protein motif does not have to be 100% conserved. Sometimes a partially conserved protein motif is sufficient. Examples of common found substitutions are Glu <--> Asp and Arg <-->Lys. If you use the degenerate "codon" GAN, it covers both Glu and Asp. Similar if you use the MGN codon, (M=C or A), where you know there should be a basic amino acid (Arg or Lys), the MGN codon covers partially the Lys codon AAR. If there is a Lys residue you will however have a G/T mismatch in the number 2 base. This is normally no problem as long as this mismatch occurs in the middle or the 5' part of the primer. (Remember your biochemistry, the enol form of thymidine can base pare with guanine). - The N-terminal part of a protein (obtained from protein sequencing) often gives enough sequence information to be used for degenerate PCR.
If the the N-terminal sequence is 20-30 amino acids, it is often possible to make two degenerate primers, and you can PCR up a 50-90 bp cDNA fragment which you can use as probe to screen a cDNA library. Alternatively you can make two degenerate primers and try a 3' RACE, to amplify the rest of the cDNA. Hint: The easiest way is normally to PCR up a fragment of the N-terminal, sequence this fragment and then make specific primers for 3' RACE.
cDNA or genomic DNA?
- In general cDNA works best as template DNA. Lower complexity of DNA, (in eukaryotes a small percentage of the genomic DNA encodes proteins).
- The size of the PCR fragment is "predictable" when you use cDNA, because there is no introns.
- Genomic DNA can be used as starting material if you are uncertain that your gene is expressed in the tissue or development stage you have chosen. The drawback is that you often have to sift through a lot of junk DNA before you find your gene. Another potential problem with genomic DNA occcurs if your PCR primers are based on protein / cDNA alignments and one of your primers span a exon intron junction.
How degenerate can PCR primers be and still function?
- 1000 - 10.000 fold degeneracy is not uncommon.
- The degeneracy of the primers can be kept "down" by substituting four base wobbles with inosines. (Example: GGI instead of GGN).
- Ex: Motif: CVGG(M/L)NRRP (found in p53 proteins).
- Without inosines. 131072 mix.
- 5' TGY GTN GGN GGN MTN AAY MGN MGN CC 3'
- With inosines. 512 mix.
- 5' TGY GTI GGI GGI MTI AAY MGN MGN CC 3'
How to choose the PCR conditions.
- Try "standard conditions" with slightly lower annealing temperature, 35-50 cycles.
- If negative by "standard conditions", run the first 4 PCR cycles at 5-10 C lower than "recommended", i.e. 42-46 C. (PCR primers with multiple mismatches will be extended, and hopefully some stick to your gene).
- If the primers are very degenerate (512 mixes or more), competitive inhibition can give problems. (Primers bind the correct template but are not extended by the polymerase because of unstable 3' ends). This means that the first PCR cycles are very inefficient, and you some times have to run 50 cycles PCR just to see a faint band of your gene.
- Remember not to use a DNA polymerase with 3-->5' exonuclease activity, (PCR primers will be degraded). Taq. polymerase works OK.
What types of genes is "easy" to find by degenerate PCR?
- Many proteins have structural similarities with other proteins and often share a common evolutionary origin.
Anopheles gambiae S. cerevisiae Giardia lamblia
- Proteins with ancient conserved motifs, (ACM's), are in general "easy" to find. More than 500 families of proteins with ACM's are known! (Some of these families are huge: Ser- Thr- Tyr- kinases in human numbers around 1000 genes). By this year, 2002, the complete sequence of 8 eukaryotic genomes are known (Human, (fruit fly), (the mosquito), (nematode), (yeast), (yeast), (plant), (protist) and pretty soon ). In addition tens of bacterial genomes are completed. These genomes provide a wealth of information regarding the evolution of various gene families and can be used as a starting point to find genes in even more obscure organisms. Start by making a protein alignment of your protein of interest. Include as many proteins as you can find. If the protein is not well conserved, try to find regions that have some conserved amino acids, and if you know the sequence from a closely related organism, use this as a "guide sequence". Sometimes you can gamble on the sequence with great luck.
Implications:
- By using degenerate PCR you can find most genes from yeast and animals irrespective of organism (cow, frog, snail, beetle, worm or fungi). Problems may arise if you try to catch the fast evolving genes. If not, you are pretty sure to find what you are looking for by using degenerate PCR. The case may be a bit harder if you look for genes in protists, such as the cryptomonads, where many genes have undergone massive genetic drift, and have changed a lot compared to other eukaryots. Apart from that, limitations are in general relatively few.
Other limitations: The conserved amino acids you try to design the PCR primers after is composed of mainly Ser, Arg and Leu. (These are the amino acids which gives most wobbles). This can sometimes give primers which are so degenerate they amplify anything, normally just a lot of garbage. The region you try to amplify is to big. As a rule of thumb, avoid PCR products larger than 1000 bp. The organism you try to amplify the fragment from has a very high GC content. This often poses trouble and you end up amplifying a lot of incorrect fragments. The organism you try to amplify the fragment from has a very low GC content and you have designed your primers too short, (high TA content gives a low melting point for your primers). The gene you are looking for do not exist in the organism you have chosen. There are a few examples of "dinosaur genes" genes which have disappeared in certain lineages during evolution. (For instance Rac genes in S. cerevisiae ).