Does Illumina Paired End Reads From Rna Seq Need To Be Groomed?
Entering edit mode
11.3 years ago
Preaslio ▴ 10

I have Illumina HiSeq2000 paired end reads from RNA sequencing (CASAVA version 1.8). I have gotten the information that the FastQ file quality encoding is in Sanger format. They have been quality checked and screened.

Now my question is, do they need any kind of grooming before mapping them to a reference genome? I'm thinking of Fastq grooming in Galaxy. Or is it fine to upload them as fastqsanger and assemble them straight away using Tophat?

paired • 3.6k views
Entering edit mode
11.3 years ago

This depends. I would guess that there are many people taking RNAseq fastq files directly from CASAVA and feeding into downstream analysis (e.g., tophat/cufflinks). Some "grooming" is done automatically in the sense that garbage reads are less likely to align and the aligner may use base quality in its determination of what is an acceptable alignment. Some people may add a duplicate removal step to eliminate duplicate reads. You can also choose your own arbitrary cutoff for average phred score of reads that you want in your downstream analysis. You should probably search this forum before asking this question. It has been addressed here (almost a duplicate question) and here. The issue of duplicate removal has also been addressed somewhat in the forum (search for duplicates) and a useful discussion on that topic can be found here and here (especially follow the link to seqanswers).

Entering edit mode

Obi Griffith: Thank you for your answer. I searched beforehand but couldn't find any similar questions. A follow-up question, I tried grooming a few of my files with the input Illumina 1.3-1.7, and flagstat on the bam-files give me an approximate ~90% properly paired reads. While if I map my reads directly I get ~80% properly paired reads. Can anyone explain why this is? And is it wrong to groom them this way before mapping?


Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6