Question: Poor Quaility Rna-Seq Data
gravatar for kcnq1ot1
5.7 years ago by
kcnq1ot10 wrote:

We've done RNA-Seq on 32 breast tumor RNAs; for the most part the RIN values were more than adequate for RNA-Seq. Strand-specific libraries were made and paired-end 100 bp reads performed. Our Bioinformatics core gave us the QC report for RNA-Seq analysis of the RNAs. They told us that the data quality is not very good based on the relatively low % mapped reads (<80%) and the significant drop off in read quality after 60 bp. All 32 samples looked similar so I doubt the problem is quality of the RNA samples. Does this result suggest that there was some systematic problem with the library preparation and/or sequencing.

fastq tophat • 2.3k views
ADD COMMENTlink modified 5.7 years ago by alpha2zee100 • written 5.7 years ago by kcnq1ot10

It could have been an issue during transfer to the facility, library creation, or sequencing. Lacking further details, it's tough to know which (even with more details, it can be difficult to know for sure). BTW, when you say that the RIN values were good, what sort of values are we talking about?

ADD REPLYlink written 5.7 years ago by Devon Ryan91k

Thank you for your help. The RIN values ranged from 7.2 - 9.5. Both the mean and median RIN values were 8.5.

ADD REPLYlink written 5.7 years ago by kcnq1ot10
gravatar for Charles Warden
5.7 years ago by
Charles Warden7.2k
Duarte, CA
Charles Warden7.2k wrote:

In general, I would say >80% alignment is ideal (and realistic to achieve, in most cases), but <80% alignment isn't necessarily horrible. Less than 50% or 20% is another story.

The significant drop off isn't good, but I can typically get decent gene-level counts with single-end 40-bp reads. So, you'll exceed that if you just trim off the last 40 bp. I would also recommend recalculating the alignment percentage after trimming the reads - I would expect the alignment percentage to increase when focusing only the high-quality portion of the reads.

How many reads do you have per sample? You should have at least 10 million for differential expression (single-end count is OK in most cases). 20-40 million paired end is good for splicing analysis.

I haven't worked with strand-specific data before. Perhaps that is a factor? Otherwise, I think there is still potential for the data to be usable.

ADD COMMENTlink written 5.7 years ago by Charles Warden7.2k

Thanks for your help.

Regarding no. reads. The mean and median mapped reads per sample were 67 million (range 40-90 million).

ADD REPLYlink written 5.7 years ago by kcnq1ot10

Yeah, you have plenty of reads. I really think you can do something with this data.

ADD REPLYlink written 5.7 years ago by Charles Warden7.2k
gravatar for alpha2zee
5.7 years ago by
alpha2zee100 wrote:

You should also consider the expertise of those who mapped the read data. Is it possible that mapping wasn't done properly... for instance, would removal of poor quality trailing bases from the reads, contaminating adapter sequences, etc. have improved the mapping?

ADD COMMENTlink written 5.7 years ago by alpha2zee100
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1057 users visited in the last hour