Question: Does Illumina Paired End Reads From Rna Seq Need To Be Groomed?
gravatar for Preaslio
8.9 years ago by
Preaslio10 wrote:

I have Illumina HiSeq2000 paired end reads from RNA sequencing (CASAVA version 1.8). I have gotten the information that the FastQ file quality encoding is in Sanger format. They have been quality checked and screened.

Now my question is, do they need any kind of grooming before mapping them to a reference genome? I'm thinking of Fastq grooming in Galaxy. Or is it fine to upload them as fastqsanger and assemble them straight away using Tophat?

paired • 3.1k views
ADD COMMENTlink modified 7.3 years ago by Biostar ♦♦ 20 • written 8.9 years ago by Preaslio10
gravatar for Obi Griffith
8.9 years ago by
Obi Griffith19k
Washington University, St Louis, USA
Obi Griffith19k wrote:

This depends. I would guess that there are many people taking RNAseq fastq files directly from CASAVA and feeding into downstream analysis (e.g., tophat/cufflinks). Some "grooming" is done automatically in the sense that garbage reads are less likely to align and the aligner may use base quality in its determination of what is an acceptable alignment. Some people may add a duplicate removal step to eliminate duplicate reads. You can also choose your own arbitrary cutoff for average phred score of reads that you want in your downstream analysis. You should probably search this forum before asking this question. It has been addressed here (almost a duplicate question) and here. The issue of duplicate removal has also been addressed somewhat in the forum (search for duplicates) and a useful discussion on that topic can be found here and here (especially follow the link to seqanswers).

ADD COMMENTlink modified 16 months ago by _r_am32k • written 8.9 years ago by Obi Griffith19k

Obi Griffith: Thank you for your answer. I searched beforehand but couldn't find any similar questions. A follow-up question, I tried grooming a few of my files with the input Illumina 1.3-1.7, and flagstat on the bam-files give me an approximate ~90% properly paired reads. While if I map my reads directly I get ~80% properly paired reads. Can anyone explain why this is? And is it wrong to groom them this way before mapping?

ADD REPLYlink written 8.9 years ago by Preaslio10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1008 users visited in the last hour