Question: order of trimming for RNA-seq QC
gravatar for maria2019
13 months ago by
maria2019100 wrote:

I have human bulk RNA-seq paired-end reads (R1, R2) and the fastqc shows that there are multiple overrepresented sequences (that are not adaptors). Also the per base sequence content shows a warning. I used BLAT to check the overrepresented sequences and they all are from either chrUn_GL000220v1 or chr14 except the sequence GGGGGG... from R2.

a) I need to trim the last 5 bases from both R1 and R2. I have read that the first 12 bases are fine and do not need to be trimmed for RNA-seq analysis ( correct me if I am wrong). b) I also need to trim the overrepresented sequences since they are contamination except the GGGG.. that did not align to a sequence from human genome.

Below is the link to the reports:

What will be order for trimming? should I trim them A) all in one run? or B) 1. ends 2. overrep seqs or C) 1. overrep seqs 2. ends I have tried them all and they all end up with different results.

A) cutadapt -u -5 -U -5 --pair-filter any --minimum-length 10 -a (overreps) A- (overreps) 10 -o tr_R1.fastq -p tr_R2.fastq R1.fastq R2.fastq

B) 1. cutadapt -u -5 -U -5 --pair-filter any --minimum-length -a (overreps) A- (overreps) 10 -o tr_ends_R1.fastq -p tr_ends_R2.fastq R1_.fastq R2.fastq 2. cutadapt -a (overreps) A- (overreps) -o tr_R1.fastq -p tr_R2.fastq tr_ends_R1.fastq tr_ends_R2.fastq

C) 1. cutadapt -a (different overreps) A- (different overreps) -o tr_overreps_R1.fastq -p tr_overreps_R2.fastq R1.fastq R2.fastq 2. cutadapt -u -5 -U -5 --pair-filter any --minimum-length 10 -o tr_R1.fastq -p tr_R2.fastq tr_overreps_R1.fastq tr_overreps_R2.fastq

fastqc rna-seq qc cutadapt • 582 views
ADD COMMENTlink written 13 months ago by maria2019100

a) Correct do not trim initial 10-15 bases.
b) Do not do anything to over-represented sequences if they are not adapters. Check to see if they are rRNA bases otherwise you may end up throwing away good data.
c) Poly-G's are likely no signal = G issue from 2-color chemistry. You can remove those stretches.

See these informative blog posts:

ADD REPLYlink modified 13 months ago • written 13 months ago by genomax83k

Thank you very much for your response. The reads are not rRNA but they are from human Chr. Are they not considered as contamination then?

ADD REPLYlink written 13 months ago by maria2019100

If they are aligning to the correct genome then they are not contamination. It is possible that some genes may be highly expressed and sequences from them may show up as "over-represented".

ADD REPLYlink written 13 months ago by genomax83k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1088 users visited in the last hour