Dear all, I'm analyzing data from TruSeq Custom Amplicon 1.5 panel.
I've read much about handling sequences linked to the original region of interest and finally got confused so any help would be highly appreciated. Maybe I'm missing something really basic here. For example, I'm not sure which sequences comprise a read aside from the insert (=targeted region) itself and flanking target-specific primers.
Illumina says that "Adapter trimming is not required for TruSeq Targeted RNA Expression, TruSeq Custom Amplicon, and TruSeq Cancer Panel when using Illumina analysis pipelines". Is it the case? Here they claim that "Each probe contains unique, target-specific sequence as well as a universal adapter sequence that is used in a subsequent amplification reaction". The aformentioned target-specific sequence is an upstream or downstream locus-specific oligo (ULSO/DLSO). Here they give only the index sequences for TSCA, no universal adapter sequence.
Please have a look at the image attached.
MultiQC plot for adapter content
This is an example MultiQC plot for adapter content, FastQC claims that it is the Illumina universal adapter (based on finding of AGATCGGAAGAG sequence, as far as I understood from FastQC's documentation). I've used
zcat LK_S2_L001_R2_001.fastq.gz | grep AGATCGGAAGAG and noticed that this sequence is a part of a longer pattern.
zcat LK_S2_L001_R2_001.fastq.gz | grep -c AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAAGGreturned 8863 (while
zcat LK_S2_L001_R2_001.fastq.gz | grep -c AGATCGGAAGAG returned 11079).
How should I handle it? They say that "if you use BWA-MEM, the trailing (5’) bases of a read that do not match the reference are soft-clipped, which covers those cases in which an adapter does occur". Yet Heng Li says: "...Bwa-mem will just soft clip them[adapter sequences]... However, it is still recommended to trim adapter sequences. After all, adapters are not part of the samples you are sequencing. They might affect variant calling in corner cases".
Also, I've read that you should get rid of ULSO and DLSO sequences. What is the typical approach here? I know that BAMclipper can do this after alignment.
To sum up, I'm trying to figure out (for TruSeq Custom Amplicon 1.5): - what sequences to trim, - when to trim them, - which tool to use.
How do you clean TruSeq Custom Amplicon data in terms of getting rid of extraneous sequences? I'm stuck. Please be merciful and sorry for the chaotic question!