What Is A Typical Quality Cutoff When Trimming Rnaseq Data?
2
1
Entering edit mode
12.1 years ago
camelbbs ▴ 710

HI I want to ask a question about trimming rnaseq.

By fastqc, i got result of sequence quality and i don't what is usual cutoff for trimming. I see the end 100-101 is lower than 20.

There are three regions of score: 0-20, 20-28, 28-40, any general cutoff for this? Is ok to cut to 95bp.

And my pairend reads are not the same length. left-end is 101 and right end is 121. is that ok for analysis? do i need to cut the 121 to 95 or else?

Thanks a lot!!

rna-seq • 7.5k views
ADD COMMENT
2
Entering edit mode
12.1 years ago
Arun 2.4k

Usually, in Casava >=1.8 format, I look for consecutive set of bases, for each read, with quality "#". This is similar to the "B" in older formats. For example:

@HWI-ST863:108:C0F9BACXX:8:1101:1235:1979
CTTACTAAAGACAATGGTGTTCTTCTCATCTTCGATGAAGTCATGACTGGATTTCGTCTAGCCTATGGTGGAGCCCAAGAATACTTTGGAATCACGC
+1:N:0:ATCACG
CCCFFFEFHHHHHJJJJFEIIJJJGHIJJHIJJJIJJIEHIIIJIIJIJJGIIIJHIGH######################################

I trim to obtain:

@HWI-ST863:108:C0F9BACXX:8:1101:1235:1979
CTTACTAAAGACAATGGTGTTCTTCTCATCTTCGATGAAGTCATGACTGGATTTCGTCT
+1:N:0:ATCACG
CCCFFFEFHHHHHJJJJFEIIJJJGHIJJHIJJJIJJIEHIIIJIIJIJJGIIIJHIGH

This has resulted in an average quality >= 30 for all my fastq files. When I mean average, its not per read, rather, the average quality at each base over all reads. If the trimmed sequence length is > a threshold ( 60 for 100 base reads in my case), then I keep them. Otherwise, I discard them. Just to give an idea. This can be done separately for both pairs. If one of the pairs gets discarded while trimming for quality, then I keep the other as a single end read in a separate file. After quality trimming, I balance the pairs so that each pair is on the same line between the 2 fastq files. However, I don't understand how/why the paired end reads are of different lengths.

ADD COMMENT
0
Entering edit mode

Note that trimming "#" from Illumina reads is not the same as trim for average quality>=30.

ADD REPLY
0
Entering edit mode

True, I have modified my reply. Thank you Sean.

ADD REPLY
0
Entering edit mode

Thanks! My pair-end reads are really in different length. 101 and 121. Does that affect the analysis?

ADD REPLY
2
Entering edit mode
12.1 years ago

Trimming based on length is probably going to be less than optimal since you will likely end up cutting off some high-quality bases in some reads and leaving low-quality bases in others. I'd suggest using any one of a dozen or more trimming softwares (cutadapt, fastx, sickle, trimmomatic) to do the quality trimming based on individual read base qualities.

ADD COMMENT

Login before adding your answer.

Traffic: 2334 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6