Question: trimming PE RNA Reads
0
gravatar for rahmati.razieh83
12 days ago by
rahmati.razieh830 wrote:

dear all;

I have problem in the trimming my RNA seq reads. The problem is that although the length of all reads are the same, 101 nucleotids, quality test by fastqc shows some reads contain adapter (adapter content is red). For reads containing adapters, I trimmed them by Trimmomatic using ILLUMINACLIP command resulting in the adapter removal. My problem is for some other reads that I can not understand if they contain adapter or not because the adapter content in fastqc file is yellow and according to adapter content graph, from 12th or 13th nucleotide, it seems something wrong because before 12th nucleotide, the graph line is completely direct. I trimmed the reads according to mRNA-Seq Library Prep Kit V2 Lexogen protocol as "The first nine nucleotides need to be removed from Read 1 (starter side), while on the stopper side it is only six nucleotides (Read 2)." But fastqc test shows the same result as before and the problem still exists. So I got completely confused. Would you pleas help me to trim these reads?

Plot

rna-seq • 223 views
ADD COMMENTlink modified 9 days ago • written 12 days ago by rahmati.razieh830

I think I must explain my problem more. maybe it is not as enough as clear. I have received some RNA reads prepared by SENSE mRNA-Seq Library Prep kit V2. After quality test of RNA seq reads with fastqc software, I found out that some reads showed red and some others yellow adapter content. What is common between the reads containing red and yellow adapter content is that, the fastqc adapter content graph is direct up to 13 th nucleotide and after 13th nucleotide there is a shift toward up but with more intensity for the reads with red adapter content. For the reads with red adapter content I removed them by removal of Illumina universal adapter via trimmomatic software using ILLUMINACLIP command. The fastQc quality test showed the removal of adapter and the adapter content got green. For the reads with yellow adapter content I can not find out if they contain adapter or not? I applied the removal of adapter command in trimmomatic for these reads (yellow adapter content). The results showed adapter removal for some reads but for some other reads the problem still persists and fastqc test still is showing yellow adapter content. So I completely got confused that what is this problem and why these graphs are still showing shift toward up. I need your help to find out the problem.

My other question is about trimming instruction in the SENSE mRNA-Seq Library Prep kit V2 protocol. I tried to trim the reads again, this time by following the instruction in the protocol mentioning the removal of 9 nucleotide from R1 and 6 nucleotides from R2 reads. So I removed 9 nucleotides from 5' side of R1 and 6 nucleotides from 3' side of of R2. But it did not make sens and the problem of adapter content graph persists. Would you please guide me how can I solve this problem? and How can trim these reads?

ADD REPLYlink written 10 days ago by rahmati.razieh830

Can you post images (or full FastQC report) for a representative sample so we can see what is going on?

Also keep in mind that FastQC is not indicating "absolute failures". There are limits that FastQC author had to set for various test parameters (and they are set for normal genomic sequencing) so having a test "fail" (red X) does not automatically mean that your data is bad. You need to consider the context of your experiment when looking at FastQC results.

ADD REPLYlink written 10 days ago by genomax37k

this is the adapter content graph in fastqc

ADD REPLYlink modified 9 days ago • written 9 days ago by rahmati.razieh830

That is showing that your data still has some Illumina universal adapters and needs to be trimmed. Using bbduk.sh from BBMap suite try. adapters.fa file is included in the resources directory of BBMap suite. Use the correct PATH for it in command below.

bbduk.sh -Xmx1g in1=read1.fq.gz in2=read2.fq.gz out1=clean1.fq.gz out2=clean2.fq.gz ref=adapters.fa ktrim=r k=23 mink=11 hdist=1 tpe tbo
ADD REPLYlink written 9 days ago by genomax37k

Thank for your comment

ADD REPLYlink written 8 days ago by rahmati.razieh830
0
gravatar for genomax
12 days ago by
genomax37k
United States
genomax37k wrote:

You appear to be confusing trimming Illumina adapters and the specific trimming that Lexogen has recommended. Instructions from lexogen appear to remove specific adapters/nucleotides their kit must be adding to the fragments. You must not have trimmed your reads correctly with Trimmomatic to remove the Illumina adapter. Based on that graph your reads after proper trimming will have a range of lengths (i.e. they won't remain all 101 bp).

ADD COMMENTlink modified 12 days ago • written 12 days ago by genomax37k
0
gravatar for michael.ante
12 days ago by
michael.ante1.8k
Austria/Vienna
michael.ante1.8k wrote:

Hi rahmati.razieh83,

Lexogen's mRNA-Seq protocol uses random-primer. AFAIK, they suggest to remove them, if you want to use Tophat2 or something similar sensitive. If you use STAR or BBmap, you can keep the complete read but you should increase the allowed numbers of mismatches.

FastQC needs a certain length to identify adapter sequences. This is apparently 12 or 13 nts. Regardless of how you trim the reads' start, the detection of adapter sequences will start at this position. Denote, the graph of the adapter content also ends before the actual cycle number is reached.

I hope this helped a bit.

Cheers,

Michael

ADD COMMENTlink written 12 days ago by michael.ante1.8k
0
gravatar for chen
10 days ago by
chen1.1k
OpenGene
chen1.1k wrote:

You can use fastp to preprocess your Illumina sequencing data (no matter RNASeq / DNASeq, no matter PE/SE). It can trim adapters automatically for both PE and SE data, which means that you don't have to input the adapter sequences.

Besides trimming adapters, this tool also performs quality filtering and other operations to improve your data quality. And most of the features are automated. All you have to do is to install fastp, and run:

fastp -i in.R1.fq.gz -I in.R2.fq.gz -o out.R1.fq.gz -O out.R2.fq.gz

This tool is very fast (written in C++, with multithreading supported), you can get it from: https://github.com/OpenGene/fastp

ADD COMMENTlink written 10 days ago by chen1.1k

It can trim adapters automatically

Where is fastp getting the adapter sequences from? OP is using a kit that has specific instructions about removing additional bases from front/end of reads.

ADD REPLYlink written 10 days ago by genomax37k

For paired end data, adapters are removed by finding the insert length (cycles beyond insert length are known as adapters).

For single end data, the adapter can be specified in the command line, or detected automatically if not specified. I developed an algorithm to detect adapter sequence by doing a simple assembly for the high frequency last 10 bp. See my code: https://github.com/OpenGene/fastp/blob/master/src/evaluator.cpp (string Evaluator::evaluateRead1Adapter() ). The detected adapter may be a bit shorter than the real one, but it's enough to trim most adapters.

ADD REPLYlink modified 10 days ago • written 10 days ago by chen1.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1460 users visited in the last hour