Question

Does Illumina Paired End Reads From Rna Seq Need To Be Groomed?

1

Entering edit mode

13.4 years ago

Preaslio ▴ 10

I have Illumina HiSeq2000 paired end reads from RNA sequencing (CASAVA version 1.8). I have gotten the information that the FastQ file quality encoding is in Sanger format. They have been quality checked and screened.

Now my question is, do they need any kind of grooming before mapping them to a reference genome? I'm thinking of Fastq grooming in Galaxy. Or is it fine to upload them as fastqsanger and assemble them straight away using Tophat?

paired • 4.5k views

ADD COMMENT • link updated 11.8 years ago by Biostar 20 • written 13.4 years ago by Preaslio ▴ 10

Ram · Answer 1 · 2012-02-28

This depends. I would guess that there are many people taking RNAseq fastq files directly from CASAVA and feeding into downstream analysis (e.g., tophat/cufflinks). Some "grooming" is done automatically in the sense that garbage reads are less likely to align and the aligner may use base quality in its determination of what is an acceptable alignment. Some people may add a duplicate removal step to eliminate duplicate reads. You can also choose your own arbitrary cutoff for average phred score of reads that you want in your downstream analysis. You should probably search this forum before asking this question. It has been addressed here (almost a duplicate question) and here. The issue of duplicate removal has also been addressed somewhat in the forum (search for duplicates) and a useful discussion on that topic can be found here and here (especially follow the link to seqanswers).