Question

Poor Quaility Rna-Seq Data

0

Entering edit mode

10.3 years ago

kcnq1ot1 • 0

We've done RNA-Seq on 32 breast tumor RNAs; for the most part the RIN values were more than adequate for RNA-Seq. Strand-specific libraries were made and paired-end 100 bp reads performed. Our Bioinformatics core gave us the QC report for RNA-Seq analysis of the RNAs. They told us that the data quality is not very good based on the relatively low % mapped reads (<80%) and the significant drop off in read quality after 60 bp. All 32 samples looked similar so I doubt the problem is quality of the RNA samples. Does this result suggest that there was some systematic problem with the library preparation and/or sequencing.

fastq tophat • 4.1k views

ADD COMMENT • link updated 10.3 years ago by alpha2zee ▴ 120 • written 10.3 years ago by kcnq1ot1 • 0

0

Entering edit mode

It could have been an issue during transfer to the facility, library creation, or sequencing. Lacking further details, it's tough to know which (even with more details, it can be difficult to know for sure). BTW, when you say that the RIN values were good, what sort of values are we talking about?

ADD REPLY • link 10.3 years ago by Devon Ryan 104k

0

Entering edit mode

Thank you for your help. The RIN values ranged from 7.2 - 9.5. Both the mean and median RIN values were 8.5.

ADD REPLY • link 10.3 years ago by kcnq1ot1 • 0

score 1 · Answer 1 · 2013-12-27

1

Entering edit mode

10.3 years ago

Charles Warden 8.2k

In general, I would say >80% alignment is ideal (and realistic to achieve, in most cases), but <80% alignment isn't necessarily horrible. Less than 50% or 20% is another story.

The significant drop off isn't good, but I can typically get decent gene-level counts with single-end 40-bp reads. So, you'll exceed that if you just trim off the last 40 bp. I would also recommend recalculating the alignment percentage after trimming the reads - I would expect the alignment percentage to increase when focusing only the high-quality portion of the reads.

How many reads do you have per sample? You should have at least 10 million for differential expression (single-end count is OK in most cases). 20-40 million paired end is good for splicing analysis.

I haven't worked with strand-specific data before. Perhaps that is a factor? Otherwise, I think there is still potential for the data to be usable.

ADD COMMENT • link 10.3 years ago by Charles Warden 8.2k

0

Entering edit mode

Thanks for your help.

Regarding no. reads. The mean and median mapped reads per sample were 67 million (range 40-90 million).

ADD REPLY • link 10.3 years ago by kcnq1ot1 • 0

0

Entering edit mode

Yeah, you have plenty of reads. I really think you can do something with this data.

ADD REPLY • link 10.3 years ago by Charles Warden 8.2k

score 1 · Answer 2 · 2013-12-28

1

Entering edit mode

10.3 years ago

alpha2zee ▴ 120

You should also consider the expertise of those who mapped the read data. Is it possible that mapping wasn't done properly... for instance, would removal of poor quality trailing bases from the reads, contaminating adapter sequences, etc. have improved the mapping?

ADD COMMENT • link 10.3 years ago by alpha2zee ▴ 120