Primer time: a rollercoaster ride through metabarcoding PCR primer design

If you want a short synopsis on metabarcoding (and dietary metabarcoding more specifically), plough on, but if you’re here for some tips and tricks, scroll down to the PCR primer design subheading.

Now more than ever, PCR has become a household name akin to Harry Potter or Gandalf the Grey, and in most cases no less mysterious nor magical. Whilst the distillation of molecular magic that is PCR has been invaluable (despite controversy) for screening your swabs, spit and other bodily fluids for COVID-19, it is also undeniably applicable to ecology. But first, what is PCR?

Polymerase chain reaction (PCR) is essentially the natural process that creates copies of (i.e. amplifies) DNA, but domesticated for science. The process basically involves repeatedly getting an enzyme called DNA Polymerase to make copies of specific bits of DNA. To do this, we need to start the process by mixing really specific nucleotide sequences (the building blocks that make up DNA) in with the DNA so that they stick to the bits that they are designed to match. These are called PCR primers and are usually designed to match two conserved (i.e. evolutionarily stable/similar between all of the things you’re interested in) sections of DNA, with a variable bit in between (which can then be used to identify the species/gene/etc. that you’re studying).

The whole process basically involves changing the temperature of the DNA repeatedly as if it’s in one of those faulty showers which is always scolding or freezing. With each cycle of temperatures (usually holding briefly at around 95 C, something like 50 C and then 72 C) more and more copies are made, making it easier to detect these specific bits of DNA. Cool, right? If you’re more of a visual person, this video is great for showing the intricacies in detail (side note: it sorely lacks musical accompaniment, but Darude’s Sandstorm perfectly lines up with the action if started simultaneously).

The PCR process. Rinse (not literally) and repeat (literally) from there. Figure created with BioRender.com.

So, what use is this to ecology?

PCR can be a fantastic diagnostic tool for checking a sample for a specific species, determining the sex of an individual organism or checking out a particular gene that you’re interested in (among countless other applications). The identification of species from short variable sections of DNA is referred to as ‘DNA barcoding’ – much like scanning a barcode in the shops, variations in the thickness of the lines (or, in this case, the string of different nucleotide bases – the building blocks of DNA) can tell you what you’re looking at.

My main use of PCR, however, is to look at what an animal has eaten by copying and checking the DNA left over in its gut (or faeces, but we’ll come back to that). To do this, you can individually check the sloppy gut contents for a load of different species with an array of specific PCR primers, or you can design PCR primers that copy the DNA of a wide range of organisms. This process is referred to as DNA metabarcoding – essentially barcoding of loads of things at the same time.

Using metabarcoding, you can similarly screen DNA found in water, soil, air, or just about anything else to build a picture of the community that lives there (or moves through), much like a really esoteric and complicated puzzle; however, just like that second-hand puzzle you once bought from a charity shop to do with your puzzle-fanatical friend on a Sunday afternoon, you will always find pieces missing (or sometimes pieces from other puzzles turning up in the box). It’s important to use PCR primers that work on all of the things that you’re looking for, but also that don’t get bogged down by all of the things you’re not so interested in.

Mash up a spider abdomen, slip some chemicals in with this soupy mess, do a PCR, throw it through a sequencer and you’ve got data (note: this is an extraordinarily simplified description). Figure created with BioRender.com.

Coming back to dinner time, metabarcoding is a really powerful tool in the quest to work out what something has eaten. Traditional dietary analyses often involved rooting through faeces (or stomachs) for lumpy bits that you could identify, but this misses soft, easily-digestible things and working out which beetle was eaten from a small fragment of wing case is a daunting task. Metabarcoding means you can detect the smallest digested fragments of prey from these faeces, increasing the detectability of those prey (or plants, fungi, etc.) manifold. It is not, however, without its problems.

Let’s take a very specific example: spiders. I was tasked with metabarcoding the gut contents of spiders at the start of my PhD – a request that was initially overwhelming. Spiders present many complications to the dietary analysis workflow. Firstly, they are fluid-feeding predators, thus traditional hard-parts analysis is useless. Secondly, they supposedly have very plastic metabolisms and can spend an unreasonably long time digesting their food, meaning their faeces is almost void of useable prey DNA in some cases. On top of all of that, their digestive system is ridiculous, with their guts going through their head, legs and basically every other body part, making dissection of their gut from the rest of their body almost impossible.

Whichever spider first decided that it would be a good idea to have guts in its head, legs and basically every other bit of its body should have been locked away by the ancestors of disgruntled molecular ecologists everywhere. Figure created with BioRender.com.

These problems culminate in a fact true of many invertebrates: arguably the most effective way of isolating this gut DNA is by mashing up a whole section of the spider (and according to Krehenwinkel et al. 2017, specifically the abdomen, also known as the opisthosoma – the squidgy bit at the back). This, however, presents a profound problem when the diet of your predator contains many closely-related prey: your PCR primers will probably mostly amplify the DNA of your predator. A lot of it.

Since the prey DNA is partially digested and present in small quantities, the far fresher DNA of the spider is more abundant, more intact and, depending on the primers used, will logarithmically outcompete the prey DNA in a PCR. This would be a great time to mention PCR bias. In a mixed community of the DNA of multiple taxa, all with different DNA sequences, PCR primers might be better at attaching to the DNA of some species depending on how closely the sequences match.

A PCR primer doesn’t need to perfectly match (we’ll come back to that later), but the degree to which is does will affect how efficient the amplification is. This difference is referred to as PCR bias and, whilst it sounds negative, we can actually use it to help us. The bias is also affected by the type of mismatch. As mentioned before, DNA is (mostly) made up of four different nucleotide “bases”: adenine (“A”), thymine (“T”), guanine (“G”) and cytosine (“C”). Much like the characters of most popular sitcoms, everyone knows that C will always end up with G, and A with T, and any attempts at pairing them differently result in awkward filler episodes and undesirable subplots which everyone inevitably pretends to forget.

These nucleotides belong to two structural categories: the pyrimidines (C and T) and the purines (A and G), with the bases naturally forming two purine-pyrimidine partnerships: G with C, and A with T. Their ability to join together is ultimately due to the number of hydrogen bonds they form (G-C = three; A-T = two). Given the greater number of hydrogen bonds between G and C, these join more strongly than the A and T. Because of these specific pairings, their different degree of bonding and different chemical structures, certain mismatches between primer and DNA can be worse than others, with purine-pyrimidine (e.g. C with A) mismatches being most debilitating, pyrimidine-pyrimidine (e.g. C with T) being pretty bad, and purine-purine (e.g. A with G) being the least problematic (aside from actual matches, of course). These relationships are of critical importance in the primer design process.

There are so many flavours of mismatch. Figure created with BioRender.com.

This is why PCR primer choice and design is so important for dietary analysis. BACK TO THE SPIDER DIET PRIMER CHOICE PROBLEM. We’re thus left with several choices to overcome the problem of spider gut content analysis, all of which are differently affected by bias:

  1. We could amplify all of the invertebrate DNA present (or the most complete range that we can; general amplification), but this will include a vast proportion of predator DNA.
  2. We could use a special PCR additive called blocking primers, which are like normal PCR primers but attach to the DNA of our predator and stop the actual PCR primers from attaching and working (blocking probes), but these can also exclude many other taxa and can increase the biases involved in PCR.
  3. We could design special primers that avoid amplifying the predator whilst still amplifying a wide range of prey (exclusive amplification), but these can have other effects on bias, usually excluding prey taxonomically close to the predator, similarly to blocking probes.

The solution to the problem depends on who you ask, but the choice you make affects the data which comes out the other side, the complexity of the protocol you follow, and the amount of data you need to generate to get an accurate picture of the diet that you’re looking at. This isn’t such a problem in some situations, but the effects of bias are no less important.

The PCR primers your choose and how you use them can fundamentally change the data you end up with! Figure created with BioRender.com.

So, let’s talk about the pointers and pitfalls of primer design.

PCR Primer design

The main considerations I want to draw attention to are:

  • Check existing primers
  • Think carefully about your target gene(s)
  • Test in silico and in vitro
  • Think about the length of your primers (20-25 bp is usually ideal) and the region you’re copying
  • Aim for a GC content of 40-60 %
  • Pay particular attention to the 3′ end, especially with Gs and Cs
  • Avoid repeats of one or two bases more than four times
  • Ideally keep the melting temperature around the 60 °C mark
  • Try to keep the melting temperatures within 5 °C between the forward and reverse
  • Check for self-dimerisation or dimerisation between the forward and reverse
  • Consider degenerate bases (but not too many)
  • Throw some exotic mock communities in when it comes to sequencing

Don’t redesign the wheel

The first rule of primer design is don’t think about primer design until you’ve confirmed that your primer doesn’t already exist. There are so many primers out there already, all with different purposes and applications. Have a look through available primers and consider testing a few (even if it’s only computational testing; i.e. ‘in silico‘ testing). If there is a primer pair already designed for something similar to what you’re doing, but it doesn’t do exactly what you want, maybe try modifying it. Change the sequence a little to better match (or mismatch) the things you want them to – it often saves a great deal of work!

The ULTIMATE marker/gene/locus

This is a question that you would get an almost limitless number of utterly disparate answers to if you were to ask everybody. Is there a wrong answer? In some contexts, yes (don’t try amplifying chloroplast DNA from animals). Is there a right answer? Almost definitely not. If you want to ruin a metabarcoding dinner party (is that a thing?), start making strong comments about your metabarcoding marker (i.e. gene region used for amplification) of choice. Of course, the DNA present will depend somewhat on which taxa you’re studying. Whether it’s plants, fungi, animals, bacteria, or something else entirely, you have a distinct set of options to consider.

Focusing on animals (as the option I know best), the standard gene used for barcoding since 1994 is the cytochrome c oxidase subunit I (COI) gene. This is a mitochondrial gene involved in the electron transport chain which has fairly ideal mutation rates for species discrimination (i.e. different species have bits that are different enough to tell them apart). Importantly, genes that are found multiple times in each cell, such as those in mitochondria (like COI) and other organelles, have a greater chance of detection by sheer probability. The COI gene has those lovely bits of sequence that are similar between a wide range of species and, between them, bits that are very different. It even has well-established public reference databases (databases like Genbank which contain sequences for species that can be used to work out what your sequencing results mean) of a massive range of species.

Sorted then! Let’s do that! But, wait… What if there was a better gene? Surely we would know, right? Well, there probably is one (depending on what group(s) you want to distinguish between). Most of our sequencing has been laser-focused on this one gene, so perhaps we just haven’t seen how great some of these other loci are for metabarcoding. Notably, other genes like 16S are increasingly being used for invertebrate metabarcoding, with many more on the rise. As these genes are used more and more, increasingly comprehensive reference databases are built, creating this self-fulfilling prophecy of usability and making it easier to qualify how useful these primers will be for your desired purpose.

As whole-genome assemblies become increasingly commonplace for a wide range of species, perhaps we can begin to identify more suitable loci for metabarcoding, but until then we’re largely stuck with a small range of (admittedly pretty good) options. It’s definitely worth thinking about your target genes, having a look at the reference data available and making an informed decision around your primers before committing.

Testing your primers (and often patience)

When it comes to testing, there are really two stages: in silico (i.e., computational; literally ‘in silicon’) and in vitro (i.e., in the lab; literally ‘in glass’). In silico testing involves computationally estimating the likelihood of primers attaching to DNA – effectively simulating PCR. Now, it’s important to remember that this is a gross oversimplification of PCR, ignoring everything from contaminants and inhibitors to enzymes and temperatures, but it’s a valuable tool nonetheless. There are many software designed to do this, from ecoPCR to Primer3, but my personal first choice is PrimerMiner (having played around with a few of them). PrimerMiner has a whole host of excellent features, but also does some cool things like accounting for the increased effect of adjacent mismatches (i.e., when the primer doesn’t match with two nucleotides of the target DNA that are next to one another, which decreases the likelihood of the amplification working), which other software often neglect.

Using the PrimerMiner R package, you can batch download massive databases of sequence data to help simplify the process of finding ideal primer sites (or adapting existing ones). In this way, PrimerMiner can give you a simplified visual representation of entire orders of data, ensuring your primers are designed on solid foundations. Below, you’ll see the primers I designed to amplify spider prey but not spiders, in which there is an A-G mismatch between spiders and many of the common prey I was interested in, right at the 3′ end of the reverse primer. A visual tool like this makes it infinitely easier to explore vast quantities of data in no time at all.

Some beautiful PrimerMiner plots of the COI gene, specifically the primer sites for my spider exclusion primers TelperionF and LaurelinR.

Using the same data as that used for the above visualisation, you can also test your primers in these “simulated PCRs” to predict what kind of biases you might expect when it comes to your sequencing data. This can really make or break a primer pair, so this in silico testing is a fantastic way of knowing whether it’s worth giving the primers a go in the lab before investing too much time, money or PCR-induced tears. The process basically works on pattern-matching: if the primer matches the DNA enough, then it passes; if not, it fails. Of course, the reality is slightly less binary and bias against something doesn’t necessarily mean it won’t amplify at all. Using these analyses, you can make sound comparisons between different primer pairs before taking a full swing at the lab work. Importantly though, all in silico analyses will over-simplify the PCR process by neglecting the nuances of small molecule interactions, inhibitors and the many complexities of PCR. This is why it’s critical to also test in the lab.

This radar plot shows how likely PrimerMiner thinks it is that each order will be amplified by each primer pair (the further to the edge the point goes, the more likely). This visually represents the expected taxonomic bias of each primer pair.

Once you’re happy with the in silico testing, you can have your primers made up (maybe read through the rest of this to check that they’re in tip top shape before you do) and start testing them for real. This in vitro testing usually involves running PCRs with DNA extracts taken from species that you’re interested in amplifying the DNA of (or not). Sometimes, just before fully delving into this, it’s also a good time to tie in some temperature gradient PCRs (explained in full below, in the “it’s getting hot in here, so disassociate all your primers” section) so that you can not only see which of these species are amplified by your primers, but also how that changes at different temperatures. That’s it, really. Run a gel or however you see if your PCRs have worked. If they have, great! If they haven’t, troubleshoot the PCR and, worst case scenario, rethink the primers.

How long’s a piece of barcode?

Depending on the target gene (and more specifically, where the conserved sites lie in that gene), the taxonomic group you’re interested in, the kind of samples you’re working with and the sequencing platform that you’re using, the amplicon size (i.e. the length of gene that you’re copying/amplifying) can entirely differ. The standard COI barcoding amplicon, briefly mentioned above, is about 700 nucleotides long (including the primers themselves), but this isn’t necessarily ideal for many high-throughput sequencing platforms like Illumina (which are often constrained by ~250 bp read lengths – these need to be overlapping forward and reverse reads which, even when overlapping minimally, doesn’t give you much more than 350-400 nucleotides when considering you need sample identifying tags and the sequencing adapters – the bits that the sequencer recognises and begins sequencing from). Nanopore sequencing has extended this, but most metabarcoding is still being sequenced on Illumina sequencers. As well, when collecting DNA from the environment or, even more so, from guts or faeces, that DNA will be degraded, limiting the fragment sizes of target DNA in your samples.

For these reasons, metabarcoding typically sticks with the 200-350 (or sometimes up to 400) base pair range. You can go shorter, but the ability to accurately identify the organisms from which the DNA came becomes increasingly difficult as you work with shorter sequences (much like if we were to refer exclusively to people by the first three letters of their name – what a nightmare that could be)! With these short fragments, even highly degraded DNA can be copied over and over, and moderately reliable data acquired. Of course, depending on the sequencing method and the kind of sample, you can push this even higher, but it’s important to think carefully about these things.

Gee, I see the importance of Gs and Cs

As discussed above, “G” and “C” bases match, and in fact stronger than their “A” and “T” counterparts due to their sharing of three, rather than two, hydrogen bonds. This makes their inclusion in primer sequences incredibly important since it defines how strongly the primers clamp onto the DNA. You want a nice strong bonding of DNA and primer, but just like a good cheese, you don’t want it to be too strong or else you won’t even get started. If 40-60 % of your nucleotides are Gs and Cs, you’re doing great!

Of prime importance: the 3′ end

The 3′ (spoken ‘three-prime’) end of your primer is undoubtedly the most important bit (at least for annealing, i.e., attaching, to the DNA). It’s the bit from which new DNA is synthesised, so if it isn’t attached properly, this will stop the PCR going ahead at all. This can be a great thing if you’re looking to avoid amplifying certain species, just like in the PrimerMiner picture above, but otherwise it can be very problematic. We briefly went over the different types of mismatch above, which factor in quite profoundly here – depending on the severity of the mismatch, it’s more or less likely to cause problems. Following on from our overview of the importance of GC content too, it’s a great idea to have one or two Gs or Cs in the three most 3′ bases of your primer to ensure that this ends clamps on nice and tight to kickstart the whole PCR process.

Repeats Repeats Repeats Repeats

Repeats of one or two nucleotides four or more times can cause issues with mispriming (i.e., the primer attaching to the target DNA incorrectly) due to the primer sequence matching that one part of the target DNA in multiple places. This can kill the PCR process before it even starts, or cause all sorts of wacky problems, so it’s an important one to avoid, and an incredibly simple one to spot, and an incredibly simple one to spot, and an incredibly simple one to spot (sorry, I’ll stop).

It’s getting hot in here, so disassociate all your primers

When you order primers, you’re usually told a melting temperature (Tm) for that specific primer (if not, you can find it via useful online resources like this one from ThermoFisher or this one from IDT, the latter even accounting for the effect of different reagent concentrations). The melting temperature is basically the temperature at which the primer should disassociate from the target DNA. It’s determined by the GC content, the length and various other things, but it’s of particular importance during the “annealing” part of the PCR process, the bit when the primer attaches to the DNA. If the primer is too warm, it simply won’t attach. If it’s too cold, it gets a lot less fussy about what it attaches to and will start annealing to non-target taxa (or even non-target genes). This is why the annealing temperature is such a critical aspect of PCR, particularly for metabarcoding.

As a rule of thumb, primers will often work well with an annealing temperature about five degrees below their melting temperature, but (as with all such rules) it’s not always that simple. One important factor is that you have two (or more) PCR primers in your reaction, and their melting temperatures will usually be a little different. It’s good practice to try to keep the melting temperatures of your primers within five degrees of one another to make this easier. Ultimately though, different PCR machines, reagents and all sorts can have an effect on this. The best way to be sure that you’ve got the right annealing temperature for your setup is by running temperature gradient PCRs. This is where you do the same PCR at a few different temperatures and see how the results compare. Inevitably, some things will not amplify at higher temperatures, but, as above, you don’t want to go too low or else you begin to lose specificity.

Dimers are forever

Much like unsupervised children with PVA glue, primers often have a propensity to stick to one another or themselves, ailing the whole PCR process. This frustrating process is known as “dimerisation” and is the principle underlying “primer dimer” – that blob of luminous frustration all too familiar with regular users of gel electrophoresis who turn on the UV light only to find their PCR product consists of a concentration of small ~25 base pair fragments. This occurs because the primers have some affinity to one another (or themselves) and thus compete with the DNA, instead binding with themselves when the right DNA is hard to come by, converting your complex reaction of cutting-edge science into an unnecessarily expensive and replicated two-piece puzzle. These dimerisation events are conceptually easy to predict (being based on nucleotide matching), but no one enjoys meticulously checking their primers against one another and themselves, particularly if you have built-in sample tags, meaning you have to check tens of primers against one another.

Luckily, there are online tools, such as this one from ThermoFisher or this one from IDT which handle the complexities. You can quickly and automatically check your primers and adjust them in advance, saving you the pain of finding out through that sorrowful realisation at the end of a gel electrophoresis.

Generate degeneracies

I briefly mentioned the problem (or benefit) of mismatches above. WELL, THEY’RE BACK. There are bound to be differences in the sequences of your target species and, in many cases, these will unavoidably pop up in the primer sites. As you just learned moments ago (unless you’re the kind of chaotic person who reads this out of order, just like I’m the kind of chaotic person who wrote this out of order, in which case, welcome), the 5′ end of the primer is a pretty forgiving place and you can get away with having a few mismatches there, but ideally not too many. The 3′ end, however, will not play the PCR game if it doesn’t like what you give it to play with. In both cases, this is where degeneracies become useful.

Degenerate bases are not bases at all, but represent the possibility of different bases. This will be less confusing in a second. There are 11 different “degenerate bases” which each represent a different combination of bases. These can be two-, three- or four-fold degeneracies, the number referring to how many different nucleotides they represent (there’s a pretty picture below showing which bases each represents). Let’s take “Y”, for example, which is the two-fold degenerate base that represents “C” and “T”. If we include a “Y” in our primer sequence, that means that half of our primers will have a “C” where that “Y” is, and the other half will have a “T”, so if the DNA of a target species has an “A” there, only the half of the primers with a “T” might amplify it. This is great because it means our primers can amplify different species with different primer site sequences, but it also effectively dilutes the primers since only half of them work for each sequence.

Given the need to match the DNA of a broad range of species in metabarcoding, degeneracies can be incredibly important, but too many can increase the chance of your primers dimerising (like we went through above). Also, if we dilute our primers too much by including loads of degeneracies, we can end up making the primer non-specific (i.e. it binds to other bits of DNA, or dimerises) or so dilute that it doesn’t do anything. It is thus best practice to limit the number of degeneracies. Najafabadi et al. came up with a great system for this, which involves calculating a “degeneracy value”, effectively a number related to how “degenerate” the primers are. This is calculated by multiplying the degeneracies together (where two-fold = 2, three-fold = 3, four-fold = 4) and the idea is to keep the value below 128 (this means you could theoretically have seven two-fold degeneracies, or three two-fold and two three-fold degeneracies, etc.). After that point, your primers will be theoretically debilitatingly diluted by one another, so problems may arise.

But, as Jedi Master Yoda said in 1980, “there is another”. Many services that produce synthetic oligonucleotides (i.e., the places you buy your custom primers from) will allow you to add inosine (denoted by an “I” in IUPAC code, the lettering system used for the other bases/degeneracies). Inosine is the ideal dinner guest of the nucleotide community: it isn’t picky and will take what it’s given. Inosine will theoretically match with all of the bases, so it means you can have an effect similar to including an “N” without creating this dilution problem, but it’s not without its drawbacks. Firstly, inosine is typically an expensive add-on, hiking the prices of your primers considerably. Secondly, inosine isn’t without its own biases, unlike simply using an “N”. An important consideration nonetheless in the battle for optimal PCR primers.

These are the degenerate bases, each letter containing which of the four nucleotides they represent. The 3-fold degeneracy names are based on the letter in the alphabet after the one nucleotide which is missing (e.g., B = anything but A), N is any (‘N‘y – do you get it? hahahahujhfheijviwejfoiewjfeioq) and the two-folds are puRine, pYrimidine, Strong, Weak, Keto and aMino. Figure created with BioRender.com.

Mock communities

At the end of it all, when your testing has been meticulously completed, you’ve checked all of the above, and you’ve crossed your thymines and dotted your inosines, it’s time to sequence. When you’re amplifying DNA in mixed samples containing the DNA of many species, as excruciatingly discussed above, you will undoubtedly encounter bias. A great way of learning how this bias might affect your data outcomes is through including mock communities in your sequencing runs.

The principle of a mock community is simple: it is a mixture of DNA of known species (either mixtures of their extracted DNA, or mixtures of the species from which DNA is then extracted). If you know the proportions of DNA of these species going in, you can empirically determine the extent of bias by looking at the sequencing read numbers that come out the other side. These mock communities can also be really helpful for data clean-up (but this is another blog post altogether). Importantly, it’s a good idea to use species that you know will be amplified by your primers (unless you’re trying to exclude amplification of something – it’s good to include that) and, ideally, that won’t be likely to appear in your study sites, or else they may appear through cross-contamination or tag-jumping (again, a whole other conversation) in your dietary/eDNA samples. The best way of getting around this is by using species exotic to your sample site but taxonomically relevant; for example, if you work on British beetles that eat flies and ants, you can find flies and ants from tropical locales for your mock communities. You can normally find kind donors of such samples in other molecular ecologists who work in different places, but are ideally based in the same country as you (otherwise the import/export paperwork can be a pain), or through the pet trade.

Double (or triple) your chances

Given that each primer pair is subject to inevitable biases, why settle for just one? Money is a reasonable answer, of course. Doubling (or even tripling) the number of PCRs and the depth of sequencing that you need is an expensive decision to make! That said, using multiple PCR primer pairs can be useful for any number of reasons. Are you interested in two taxonomically disparate groups of organisms (e.g., plants and animals, or even mites and aphids)? You could use a primer pair for each if you’re only interested in these two groups, rather than trying to find something that does both (and probably a range of other things you’re not interested in). Do you have a problem with predator DNA swamping your dietary data output, but you’re still interested in trophic interactions between closely-related species? Why not have one primer that excludes the predator group (thus achieving greater sequencing depth) and another than includes similar species but loses a lot of data to the predator (achieving some breadth)? Some researchers even argue that we should all be using multiple primer pairs in our metabarcoding work (see here, here and here). This reduces the effect of the bias of just one primer pair by balancing it out with another, but it also reduces the accessibility of this growing field of science, so it’s a little problematic.

I hope that helps!

I think it’s important to note that I’m not an all-knowing sage. I have learned the above through frustrating ambles through the literature and excruciating trial-and-error. I’ve learned along the way and have applied various aspects of this to everything from spiders, through nematodes, to aphids, ultimately relying on it for my PhD and subsequent postdoctoral research, but my knowledge is far from comprehensive. I hope that someone can learn from this post, but above that I hope that they can build on it and exceed my relatively shallow knowledge base.

Efficient primer design and testing must follow a strict framework and, ultimately, all known primers fail to amplify some taxa, so compromises must always be made. Hopefully this very surface-level guide will help someone at some point, but I would fully encourage wider reading. But for now, go forth and multiply (the DNA of a broad range of species in a single reaction).

Update (December 2022): I’ve had a few people recently tell me that they’re still finding this useful which is amazing to hear! I’m so glad it’s been of some help. We published a paper earlier this year on choosing PCR primers for predator-prey dietary analysis which more formally describes many of the concepts mentioned here, but also extends these with some novel strategies for reducing the ‘predator problem’, so do check that out if you want more:

The predator problem and PCR primers in molecular dietary analysis: Swamped or silenced; depth or breadth?

Massive thanks to Vasco Elbrecht, Andrew Richardson, Andreas K and Paddy Hooper for suggesting a few edits!

Some references related to all of this:

Alberdi, A., Aizpurua, O., Gilbert, M. T. P., & Bohmann, K. (2017). Scrutinizing key steps for reliable metabarcoding of environmental samples. Methods in Ecology and Evolution, 9(1), 1–14. https://doi.org/10.1111/2041-210X.12849

Ammann, L. et al. 2020. Insights into aphid prey consumption by ladybirds: optimising field sampling methods and primer design for high throughput sequencing. Plos One 15, e0235054. doi: 10.1371/journal.pone.0235054.

Bajwa, A.A. et al. 2019. Assessment of nematodes in Punjab Urial (Ovis vignei punjabiensis) population in Kalabagh Game Reserve: development of a DNA barcode approach. European Journal of Wildlife Research 65: 63. doi: 10.1007/s10344-019-1298-y

Brandon-Mong, G.-J., Gan, H., Sing, K., Lee, P., Lim, P., & Wilson, J. (2015). DNA metabarcoding of insects and allies: an evaluation of primers and pipelines. Bulletin of Entomological Research, 105(6), 717–727. https://doi.org/10.1017/S0007485315000681

Braukmann, T. W. A., Ivanova, N. V., Prosser, S. W. J., Elbrecht, V., Steinke, D., Ratnasingham, S., de Waard, J. R., Sones, J. E., Zakharov, E. V., & Hebert, P. D. N. (2019). Metabarcoding a diverse arthropod mock community. Molecular Ecology Resources, 19(3), 711–727. https://doi.org/10.1111/1755-0998.13008

Bru, D., Martin-Laurent, F., & Philippot, L. (2008). Quantification of the detrimental effect of a single primer-template mismatch by real-time PCR using the 16S rRNA gene as an example. Applied and Environmental Microbiology, 74(5), 1660–1663. https://doi.org/10.1128/AEM.02403-07

Cuff, J.P. et al. 2021. Money spider dietary choice in pre- and post-harvest cereal crops using metabarcoding. Ecological Entomology 46, 249–261. https://doi.org/10.1111/een.12957

da Silva, LP, Mata, VA, Lopes, PB, et al. Advancing the integration of multi‐marker metabarcoding data in dietary analysis of trophic generalists. Mol Ecol Resour. 2019; 19: 1420– 1432. https://doi.org/10.1111/1755-0998.13060

Deagle, B. E., Jarman, S. N., Coissac, E., Taberlet, P., & Deagle, B. E. (2014). DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match. Biology Letters, 10, 2014–2017. https://doi.org/10.1098/rsbl.2014.0562

Elbrecht, V., Braukmann, T. W. A., Ivanova, N. V, Prosser, S. W. J., Hajibabaei, M., Wright, M., Zakharov, E. V, Hebert, P. D. N., & Steinke, D. (2019). Validation of COI metabarcoding primers for terrestrial arthropods. PeerJ, 7, e7745. https://doi.org/10.7717/peerj.7745

Elbrecht, V., & Leese, F. (2016). PrimerMiner: an R package for development and in silico validation of DNA metabarcoding primers. Methods in Ecology and Evolution, 8(5), 622–626. https://doi.org/10.1111/2041-210X.12687

Ficetola, G. F., Coissac, E., Zundel, S., Riaz, T., Shehzad, W., Bessière, J., Taberlet, P., & Pompanon, F. (2010). An in silico approach for the evaluation of DNA barcodes. BMC Genomics, 11(1), 434. https://doi.org/10.1186/1471-2164-11-434

Folmer, O., Black, M., Hoeh, W., Lutz, R., & Vrijenhoek, R. (1994). DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology, 3(5), 294–299. https://doi.org/10.1371/journal.pone.0013102

Krehenwinkel, H., Kennedy, S., Pekár, S., & Gillespie, R. G. (2017). A cost-efficient and simple protocol to enrich prey DNA from extractions of predatory arthropods for large-scale gut content analysis by Illumina sequencing. Methods in Ecology and Evolution, 8, 126–134. https://doi.org/10.1111/2041-210X.12647

Kwok, S., Kellogg, D. E., Mckinney, N., Spasic, D., Goda, L., Levenson, C., & Sninsky, J. J. (1990). Effects of primer-template mismatches on the polymerase chain reaction: human immunodeficiency virus type 1 model studies. Nucleic Acids Research, 18(4), 999–1005. https://doi.org/10.1093/nar/18.4.999

Lafage, D., Elbrecht, V., Cuff, J. P., Steinke, D., Hambäck, P. A., & Erlandsson, A. (2019). A new primer for metabarcoding of spider gut contents. Environmental DNA, 2(2), 234–243. https://doi.org/10.1002/edn3.62

MacDonald, A. J., & Sarre, S. D. (2016). A framework for developing and validating taxon-specific primers for specimen identification from environmental DNA. Molecular Ecology Resources, 17(4), 1–13. https://doi.org/10.1111/1755-0998.12618

Macıas-Hernández, N., Athey, K., Tonzo, V., Wangensteen, O. S., Arnedo, M., & Harwood, J. D. (2018). Molecular gut content analysis of different spider body parts. PLoS ONE, 13(5), 1–16. https://doi.org/10.1371/journal.pone.0196589

Martin, F. H., Castro, M. M., Aboul-Ela, F., & Tinoco, I. (1985). Base pairing involving deoxyinosine: implications for probe design. Nucleic Acids Research, 13(24), 8927–8938. https://doi.org/10.1093/nar/13.24.8927

Najafabadi, H. S., Torabi, N., & Chamankhah, M. (2008). Designing multiple degenerate primers via consecutive pairwise alignments. BMC Bioinformatics, 9(1), 55. https://doi.org/10.1186/1471-2105-9-55

Piñol, J., San Andrés, V., Clare, E. L., Mir, G., & Symondson, W. O. C. (2014). A pragmatic approach to the analysis of diets of generalist predators: the use of next-generation sequencing with no blocking probes. Molecular Ecology Resources, 14(1), 18–26. https://doi.org/10.1111/1755-0998.12156

Piñol, Josep, Mir, G., Gomez-Polo, P., & Agusti, N. (2015). Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods. Molecular Ecology Resources, 15, 819–830. https://doi.org/10.1111/1755-0998.12355

Piñol, Josep, Senar, M. A., & Symondson, W. O. C. (2018). The choice of universal primers and the characteristics of the species mixture determines when DNA metabarcoding can be quantitative. Molecular Ecology, 28(2), 407–419. https://doi.org/10.1111/mec.14776

Pompanon, F., Deagle, B. E., Symondson, W. O. C., Brown, D. S., Jarman, S. N., & Taberlet, P. (2012). Who is eating what: diet assessment using next generation sequencing. Molecular Ecology, 21(8), 1931–1950. https://doi.org/10.1111/j.1365-294X.2011.05403.x

Silva, L. P., Jarman, S. N., Mata, V. A., Lopes, R. J., & Beja, P. (2019). Advancing the integration of multi-marker metabarcoding data in dietary analysis of trophic generalists. Molecular Ecology Resources, 19(6), 1420–1432. https://doi.org/10.1111/1755-0998.13060

Stadhouders, R., Pas, S. D., Anber, J., Voermans, J., Mes, T. H. M., & Schutten, M. (2010). The effect of primer-template mismatches on the detection and quantification of nucleic acids using the 5′ nuclease assay. Journal of Molecular Diagnostics, 12(1), 109–117. https://doi.org/10.2353/jmoldx.2010.090035

Taberlet, P., Bonin, A., Zinger, L., & Coissac, E. (2018). Environmental DNA. Oxford University Press.

Tercel, M.P.T.G., Symondson, W.O.C. and Cuff, J.P. (2021), The problem of omnivory: a synthesis on omnivory and DNA metabarcoding. Molecular Ecology. https://doi.org/10.1111/mec.15903

Wright, E. S., Yilmaz, L. S., Ram, S., Gasser, J. M., Harrington, G. W., & Noguera, D. R. (2014). Exploiting extension bias in polymerase chain reaction to improve primer specificity in ensembles of nearly identical DNA templates. Environmental Microbiology, 16(5), 1354–1365. https://doi.org/10.1111/1462-2920.12259