RSeQC confusing output - Half fails to determine, half is reverse strand
22 months ago
compuTE ▴ 140

Hello,

I am checking the strandness of a sample using infer_experiment.py and I got a pretty confusing result (for me).

infer_experiment.py -i sample_Aligned.sortedByCoord.out.bam -r gencode.v36.annotation.gene.bed


Results in:

This is PairEnd Data
Fraction of reads failed to determine: 0.4848
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0122
Fraction of reads explained by "1+-,1-+,2++,2--": 0.5030


So half fails to be determined and half seems to be strand-specific.

I know where this sample comes from, so I know that it's a strand-specific library. My question is why do I have half of the reads undetermined?

Any ideas?

RNA-Seq rseqc pair-end
Hi, I have similar problems. Did you figure out why this happened? Thanks for your help.

I think it may be related with the no. of reads mapped. Which was the percentage of uniquely mapped reads with STAR?

Also, infer_experiment.py by default samples a fraction of your total number of reads. If I'm not wrong is 200000 reads.

9 weeks ago
Xin • 0

It's 20 months late, but today I encountered the same problem. Now, I know this problem is caused by annotation files. I use the new version annotation when aligning my reads with STAR, but use the old version annotation (provide by their web link) when using RseQC. When I downloaded a new version bed file from UCSC Table Brower (https://genome.ucsc.edu/cgi-bin/hgTables) and used it in RseQC, all problems were fixed.

The gtf file that you are using must be unifrom across the data analysis. You can not introduce suddenly a gtf file for the purpose of running infer_experiment.py. The gtf file for the bam file creation and strandedness check must be the same