how to know what adapter sequences to trim for RNA-seq?
2
0
Entering edit mode
10 weeks ago

Hello,

I'm very new to RNA-seq analysis and am currently stuck on the trimming step. The libraries I am trying to analyze were built using the NEXTFLEX Rapid Directional RNA-seq kit with their Unique Dual Index Barcodes (https://perkinelmer-appliedgenomics.com/wp-content/uploads/2022/02/NOVA-51292X-NEXTFLEX-RNA-Seq-2-0-UDI-Barcodes-V22-02-new.pdf). They were submitted 2x111 paired-end sequencing.

I'm trying to use cutadapt to do some pair-end trimming but am struggling to understand what exactly I need to trim here. Each barcode has a unique 8 bp index that corresponds to the P5/P7 regions -- is this what I am supposed to trim off? So in cutadapt: -a XXXXXXXX(P5 index) -A XXXXXXXX (P7index)

Or do I have to trim off the entire udi barcode (which seems rather long since my sequences are only 111bp) like this?: -a AATGATACGGCGACCACCGAGATCTACACXXXXXXXXACACTCTTTCCCTACACGACGCTCTTCCGATCT -A GATCGGAAGAGCACACGTCTGAACTCCAGTCACXXXXXXXXATCTCGTATGCCGTCTTCTGCTTG

Oligonucleotide sequence

The other component that is confusing to me is that when I checked the quality of the FASTQ file on FASTQC, they always point out an overrepresented TruSeq adapter sequence, which is confusing to me since this adapter was not used during library prep

If anyone has experience trimming with these barcodes or have any insights, that would be awesome!

fastq RNA-seq cutadapt • 518 views
ADD COMMENT
0
Entering edit mode
10 weeks ago
ATpoint 73k

If fastqc doesn't report anything as adapter contamination then there is none most likely and you don't need any thimming.

The AATGATACGGCG... primer is part of the TruSeq adapter. Illumina libraries, even if using custom kits still need certain sequences to work with the Illumina chemistry, so that is just TruSeq, regardless of the name.

If you need to trim anythign it is likely the standard TruSeq/Universal Adapter sequence from Illumina.

ADD COMMENT
0
Entering edit mode

Do you happen to know if the middle part of this udi barcode (GATCGGAAGAGCACACGTCTGAACTCCAGTCACGTCTACATATCTCG) is also part of the TruSeq adapter? This is what is showing up as overrepresented when checking on FASTQC but is trimming this sequencing for read 1 and read 2 sufficient?

ADD REPLY
0
Entering edit mode
10 weeks ago
Ming Tommy Tang ★ 2.9k

You may use fastp which detect adaptors by itself https://github.com/OpenGene/fastp#adapters

ADD COMMENT

Login before adding your answer.

Traffic: 2150 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6