Question

Problem with Fastqc quality control for miRNAseq data

0

Entering edit mode

8.8 years ago

silas008 ▴ 170

I'm trying to filter my miRNAseq data. The reads are from 18 to 30 nucleotides. The Fastqc shows there are some problem with Per Base Sequence Content and Per Sequence CG Content. The quality of bases is OK, so, why it shows the problem with the other parameters? Is expected that miRNA have the same content of CG and Per Base Content that mRNAseq or DNAseq?

Thank You

RNA-Seq miRNA Fastqc • 3.4k views

ADD COMMENT • link updated 17 months ago by Ram 43k • written 8.8 years ago by silas008 ▴ 170

0

Entering edit mode

Please post the full report, and describe your intended experiment.

ADD REPLY • link 8.8 years ago by Brian Bushnell 20k

Ram · Answer 1 · 2015-07-18

Hi!

Well, don't know if it is really a problem. normally in miRNA data, there are a bunch super highly expressed, and then the rest. That means that the time each miRNA is represented is a lot. I guess you see something weird with duplication figure as well?

Fastqc was designed for DNAseq, which you get random sequences equally represented over the genome. Something that is not true for RNAseq, and it is even worse for sRNAseq since the complexity is lower. In this cases, the expression will affect how much reads are represented. For RNAseq, at least you have different fragments from the genes. For sRNA always is the same fragment since they are small.

I would do a cumulative expression figure after you map to miRNAs and quantify to see the complexity and the number of miRNA detected, for instance. If that is ok, then I think you shouldn't be worry.