Question

Amplicon sequencing: how to handle adapters and primers?

1

Entering edit mode

7.8 years ago

lamteva.vera ▴ 220

Dear all, I'm analyzing data from TruSeq Custom Amplicon 1.5 panel.

I've read much about handling sequences linked to the original region of interest and finally got confused so any help would be highly appreciated. Maybe I'm missing something really basic here. For example, I'm not sure which sequences comprise a read aside from the insert (=targeted region) itself and flanking target-specific primers.

Illumina says that "Adapter trimming is not required for TruSeq Targeted RNA Expression, TruSeq Custom Amplicon, and TruSeq Cancer Panel when using Illumina analysis pipelines". Is it the case? Here they claim that "Each probe contains unique, target-specific sequence as well as a universal adapter sequence that is used in a subsequent amplification reaction". The aformentioned target-specific sequence is an upstream or downstream locus-specific oligo (ULSO/DLSO). Here they give only the index sequences for TSCA, no universal adapter sequence.

Please have a look at the image attached. MultiQC plot for adapter content This is an example MultiQC plot for adapter content, FastQC claims that it is the Illumina universal adapter (based on finding of AGATCGGAAGAG sequence, as far as I understood from FastQC's documentation). I've used zcat LK_S2_L001_R2_001.fastq.gz | grep AGATCGGAAGAG and noticed that this sequence is a part of a longer pattern. zcat LK_S2_L001_R2_001.fastq.gz | grep -c AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAAGGreturned 8863 (while zcat LK_S2_L001_R2_001.fastq.gz | grep -c AGATCGGAAGAG returned 11079).

How should I handle it? They say that "if you use BWA-MEM, the trailing (5’) bases of a read that do not match the reference are soft-clipped, which covers those cases in which an adapter does occur". Yet Heng Li says: "...Bwa-mem will just soft clip them[adapter sequences]... However, it is still recommended to trim adapter sequences. After all, adapters are not part of the samples you are sequencing. They might affect variant calling in corner cases".

Also, I've read that you should get rid of ULSO and DLSO sequences. What is the typical approach here? I know that BAMclipper can do this after alignment.

To sum up, I'm trying to figure out (for TruSeq Custom Amplicon 1.5): - what sequences to trim, - when to trim them, - which tool to use.

How do you clean TruSeq Custom Amplicon data in terms of getting rid of extraneous sequences? I'm stuck. Please be merciful and sorry for the chaotic question!

amplicon adapter trimming • 7.0k views

ADD COMMENT • link updated 7.8 years ago by Robert Sicko ▴ 640 • written 7.8 years ago by lamteva.vera ▴ 220

2

Entering edit mode

Illumina says that "Adapter trimming is not required

Yet another reason to not trust what Illumina says. They create good sequencing machines (though recently, each iteration has been worse) but their software is terrible, and in general, given that their latest sequencers no longer report quality scores correctly, I'm not entirely sure they have anyone in charge of their products that understands their users.

Mapping will always be more accurate if adapter sequences are removed first.

ADD REPLY • link 7.8 years ago by Brian Bushnell 20k

1

Entering edit mode

Yes, you should trim primer sequences. Those are not true "observations" of your individuals, you may introduce artefacts in rare cases.

I haven't tried it myself, but I think https://github.com/tommyau/bamclipper does the job.

ADD REPLY • link 7.8 years ago by WouterDeCoster 48k

2

Entering edit mode

If your amplicon panel does not have any overlapping amplicons, removing primer or not depends on whether you expect and tolerate missing "rare cases", e.g. SNV or INDEL near gene-specific primers. It depends on the genes of interest and the panel design in the context of variant types and location. In our case, mutations could be anywhere and any type: we came across with a clinical case of BRCA1 deletion that is close to the gene-specific primer. That INDEL could be missed only if gene-specific primer sequence is perfectly removed from FASTQ before BWA-MEM (https://www.nature.com/articles/s41598-017-01703-6/figures/2).

If there are overlapping amplicons in the panel, you should consider removing the gene-specific primers for the sake of accurate sequencing depth calculation and variant calling, no matter it is removed in FASTQ level or BAM level or variant calling algorithm level.

ADD REPLY • link 7.8 years ago by Tommy Au ▴ 70

0

Entering edit mode

Yes, the amplicons are overlapping so I'll try to remove primers with BAMClipper. Thank you.

amplicons

ADD REPLY • link 7.8 years ago by lamteva.vera ▴ 220

0

Entering edit mode

If ULSO and DLSO are upstream and downstream to the targeted sequence and I'm going to restrict variant calling using an intervals list, than do I need to trim the primers?

ADD REPLY • link 7.8 years ago by lamteva.vera ▴ 220

1

Entering edit mode

restricting with an interval list will still cause you to miss some real variants you could detect if you trim USLO/DLSO. See my IGV screenshots in this post.

ADD REPLY • link 7.8 years ago by Robert Sicko ▴ 640

score 2 · Answer 1 · 2017-09-29

If you're using MiSeq reporter (MSR) this is all taken care of automatically. MSR groups reads by ULSO/DLSO, then aligns and finally soft-clips ULSO/DLSO. You correctly stated that BAMClipper can do this after alignment with BWA-MEM.

IMO trimming adapter sequences of the raw FASTQ files can't hurt, but it may not be necessary or change your results. You'd really need to compare variant calls from a pipeline that does and does not trim adapters. I think you'll see more difference in calls from ULSO/DLSO trimming/soft-clipping.

Possible pipelines that take different approaches to TSCA data - bcbio feature request for TSCA handling, UNDRROVER and AmpliVar