Question: Amplicon sequencing: how to handle adapters and primers?
1
gravatar for lamteva.vera
23 months ago by
lamteva.vera160
Ukraine, Kyiv
lamteva.vera160 wrote:

Dear all, I'm analyzing data from TruSeq Custom Amplicon 1.5 panel.

I've read much about handling sequences linked to the original region of interest and finally got confused so any help would be highly appreciated. Maybe I'm missing something really basic here. For example, I'm not sure which sequences comprise a read aside from the insert (=targeted region) itself and flanking target-specific primers.

Illumina says that "Adapter trimming is not required for TruSeq Targeted RNA Expression, TruSeq Custom Amplicon, and TruSeq Cancer Panel when using Illumina analysis pipelines". Is it the case? Here they claim that "Each probe contains unique, target-specific sequence as well as a universal adapter sequence that is used in a subsequent amplification reaction". The aformentioned target-specific sequence is an upstream or downstream locus-specific oligo (ULSO/DLSO). Here they give only the index sequences for TSCA, no universal adapter sequence.

Please have a look at the image attached. MultiQC plot for adapter content This is an example MultiQC plot for adapter content, FastQC claims that it is the Illumina universal adapter (based on finding of AGATCGGAAGAG sequence, as far as I understood from FastQC's documentation). I've used zcat LK_S2_L001_R2_001.fastq.gz | grep AGATCGGAAGAG and noticed that this sequence is a part of a longer pattern. zcat LK_S2_L001_R2_001.fastq.gz | grep -c AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAAGGreturned 8863 (while zcat LK_S2_L001_R2_001.fastq.gz | grep -c AGATCGGAAGAG returned 11079).

How should I handle it? They say that "if you use BWA-MEM, the trailing (5’) bases of a read that do not match the reference are soft-clipped, which covers those cases in which an adapter does occur". Yet Heng Li says: "...Bwa-mem will just soft clip them[adapter sequences]... However, it is still recommended to trim adapter sequences. After all, adapters are not part of the samples you are sequencing. They might affect variant calling in corner cases".

Also, I've read that you should get rid of ULSO and DLSO sequences. What is the typical approach here? I know that BAMclipper can do this after alignment.

To sum up, I'm trying to figure out (for TruSeq Custom Amplicon 1.5): - what sequences to trim, - when to trim them, - which tool to use.

How do you clean TruSeq Custom Amplicon data in terms of getting rid of extraneous sequences? I'm stuck. Please be merciful and sorry for the chaotic question!

amplicon adapter trimming • 2.1k views
ADD COMMENTlink modified 23 months ago by Robert Sicko570 • written 23 months ago by lamteva.vera160
2

Illumina says that "Adapter trimming is not required

Yet another reason to not trust what Illumina says. They create good sequencing machines (though recently, each iteration has been worse) but their software is terrible, and in general, given that their latest sequencers no longer report quality scores correctly, I'm not entirely sure they have anyone in charge of their products that understands their users.

Mapping will always be more accurate if adapter sequences are removed first.

ADD REPLYlink written 23 months ago by Brian Bushnell16k
1

Yes, you should trim primer sequences. Those are not true "observations" of your individuals, you may introduce artefacts in rare cases.

I haven't tried it myself, but I think https://github.com/tommyau/bamclipper does the job.

ADD REPLYlink written 23 months ago by WouterDeCoster40k
2

If your amplicon panel does not have any overlapping amplicons, removing primer or not depends on whether you expect and tolerate missing "rare cases", e.g. SNV or INDEL near gene-specific primers. It depends on the genes of interest and the panel design in the context of variant types and location. In our case, mutations could be anywhere and any type: we came across with a clinical case of BRCA1 deletion that is close to the gene-specific primer. That INDEL could be missed only if gene-specific primer sequence is perfectly removed from FASTQ before BWA-MEM (https://www.nature.com/articles/s41598-017-01703-6/figures/2).

If there are overlapping amplicons in the panel, you should consider removing the gene-specific primers for the sake of accurate sequencing depth calculation and variant calling, no matter it is removed in FASTQ level or BAM level or variant calling algorithm level.

ADD REPLYlink modified 22 months ago • written 22 months ago by Tommy Au70

Yes, the amplicons are overlapping so I'll try to remove primers with BAMClipper. Thank you.

amplicons

ADD REPLYlink modified 22 months ago • written 22 months ago by lamteva.vera160

If ULSO and DLSO are upstream and downstream to the targeted sequence and I'm going to restrict variant calling using an intervals list, than do I need to trim the primers?

ADD REPLYlink written 23 months ago by lamteva.vera160
1

restricting with an interval list will still cause you to miss some real variants you could detect if you trim USLO/DLSO. See my IGV screenshots in this post.

ADD REPLYlink written 23 months ago by Robert Sicko570
2
gravatar for Robert Sicko
23 months ago by
Robert Sicko570
United States
Robert Sicko570 wrote:

If you're using MiSeq reporter (MSR) this is all taken care of automatically. MSR groups reads by ULSO/DLSO, then aligns and finally soft-clips ULSO/DLSO. You correctly stated that BAMClipper can do this after alignment with BWA-MEM.

IMO trimming adapter sequences of the raw FASTQ files can't hurt, but it may not be necessary or change your results. You'd really need to compare variant calls from a pipeline that does and does not trim adapters. I think you'll see more difference in calls from ULSO/DLSO trimming/soft-clipping.

Possible pipelines that take different approaches to TSCA data - bcbio feature request for TSCA handling, UNDRROVER and AmpliVar

ADD COMMENTlink written 23 months ago by Robert Sicko570
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1505 users visited in the last hour