Hi all,
So I am using RSeQC infer_experiment.py to check the strand-specificity of a public RNA-seq experiment. In the corresponding paper, the authors explicitly state that they have used the TruSeqâ„¢ Stranded Total RNA kit protocol. But this is what the output of RSeQC infer_experiment.py looks like:
This is PairEnd Data
Fraction of reads failed to determine: 0.0787
Fraction of reads explained by "1++,1--,2+-,2-+": 0.4686
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4527
Finshed Checking Strand Specificty
So not strand-specific ... Another Run of the same experiment looks a bit better
This is PairEnd Data
Fraction of reads failed to determine: 0.0946
Fraction of reads explained by "1++,1--,2+-,2-+": 0.6052
Fraction of reads explained by "1+-,1-+,2++,2--": 0.3002
Finshed Checking Strand Specificty
Usually I get something like this:
This is PairEnd Data
Fraction of reads failed to determine: 0.0500
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0287
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9213
Finshed Checking Strand Specificty
So I suppose the stranded protocol did not really work for the data set in question... However, my question now is: Is there a guide for the fractions? what fraction is still ok for the data to be strand-specific? Is there a cutoff? I sometimes get for Fraction of reads explained by "1+-,1-+,2++,2--": 0.7 / 0.8 - is this still ok?
Thanks for your input!
Thanks for the answer.
Did you get something like this before?
Not sure what to make of this...
interesting...what organism is this? did you notice any other problems with this data set, e.g. after running FastQC?
QoRTs also has a script that checks strand-specificity, would be interesting to see what it spits out for those data sets.
No, I get something similar from an Encode bam file I downloaded.
https://www.encodeproject.org/experiments/ENCSR274JRR/ ENCFF978EVS.bam
thanks for the link to that tool! I'll try it out :)
oh, and it's human data
Could an insert size of 0 be an issue for the tool? This is the only communality I found between getting "weird" output and the different public datasets, although I don't know about the Encode one - I couldn't find an insert size for this one.
That's a first, I presume there's chromosome naming issue or something.
So I did
got this
Maybe the BAM file is aligned against hg19? Have a look at the lengths of a couple chromosomes.
Well, I selected GRCh38 ... It's a different version though GRCh38 V24.
hm, I don't think so - I check this beforehand and edited the bed file...
and