Poor Quaility Rna-Seq Data
2
0
Entering edit mode
10.3 years ago
kcnq1ot1 • 0

We've done RNA-Seq on 32 breast tumor RNAs; for the most part the RIN values were more than adequate for RNA-Seq. Strand-specific libraries were made and paired-end 100 bp reads performed. Our Bioinformatics core gave us the QC report for RNA-Seq analysis of the RNAs. They told us that the data quality is not very good based on the relatively low % mapped reads (<80%) and the significant drop off in read quality after 60 bp. All 32 samples looked similar so I doubt the problem is quality of the RNA samples. Does this result suggest that there was some systematic problem with the library preparation and/or sequencing.

fastq tophat • 4.1k views
ADD COMMENT
0
Entering edit mode

It could have been an issue during transfer to the facility, library creation, or sequencing. Lacking further details, it's tough to know which (even with more details, it can be difficult to know for sure). BTW, when you say that the RIN values were good, what sort of values are we talking about?

ADD REPLY
0
Entering edit mode

Thank you for your help. The RIN values ranged from 7.2 - 9.5. Both the mean and median RIN values were 8.5.

ADD REPLY
1
Entering edit mode
10.3 years ago

In general, I would say >80% alignment is ideal (and realistic to achieve, in most cases), but <80% alignment isn't necessarily horrible. Less than 50% or 20% is another story.

The significant drop off isn't good, but I can typically get decent gene-level counts with single-end 40-bp reads. So, you'll exceed that if you just trim off the last 40 bp. I would also recommend recalculating the alignment percentage after trimming the reads - I would expect the alignment percentage to increase when focusing only the high-quality portion of the reads.

How many reads do you have per sample? You should have at least 10 million for differential expression (single-end count is OK in most cases). 20-40 million paired end is good for splicing analysis.

I haven't worked with strand-specific data before. Perhaps that is a factor? Otherwise, I think there is still potential for the data to be usable.

ADD COMMENT
0
Entering edit mode

Thanks for your help.

Regarding no. reads. The mean and median mapped reads per sample were 67 million (range 40-90 million).

ADD REPLY
0
Entering edit mode

Yeah, you have plenty of reads. I really think you can do something with this data.

ADD REPLY
1
Entering edit mode
10.3 years ago
alpha2zee ▴ 120

You should also consider the expertise of those who mapped the read data. Is it possible that mapping wasn't done properly... for instance, would removal of poor quality trailing bases from the reads, contaminating adapter sequences, etc. have improved the mapping?

ADD COMMENT

Login before adding your answer.

Traffic: 1824 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6