Question

Is the RNA-seq library indeed strand-specific?

2

Entering edit mode

7.4 years ago

JJ ▴ 760

Hi all,

So I am using RSeQC infer_experiment.py to check the strand-specificity of a public RNA-seq experiment. In the corresponding paper, the authors explicitly state that they have used the TruSeq™ Stranded Total RNA kit protocol. But this is what the output of RSeQC infer_experiment.py looks like:

This is PairEnd Data
Fraction of reads failed to determine: 0.0787
Fraction of reads explained by "1++,1--,2+-,2-+": 0.4686
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4527
Finshed Checking Strand Specificty

So not strand-specific ... Another Run of the same experiment looks a bit better

This is PairEnd Data
Fraction of reads failed to determine: 0.0946
Fraction of reads explained by "1++,1--,2+-,2-+": 0.6052
Fraction of reads explained by "1+-,1-+,2++,2--": 0.3002
Finshed Checking Strand Specificty

Usually I get something like this:

This is PairEnd Data
Fraction of reads failed to determine: 0.0500
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0287
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9213
Finshed Checking Strand Specificty

So I suppose the stranded protocol did not really work for the data set in question... However, my question now is: Is there a guide for the fractions? what fraction is still ok for the data to be strand-specific? Is there a cutoff? I sometimes get for Fraction of reads explained by "1+-,1-+,2++,2--": 0.7 / 0.8 - is this still ok?

Thanks for your input!

RNA-Seq • 2.7k views

ADD COMMENT • link updated 7.4 years ago by Devon Ryan 105k • written 7.4 years ago by JJ ▴ 760

score 2 · Answer 1 · 2018-02-06

2

Entering edit mode

7.4 years ago

Devon Ryan 105k

There's no strict guide for this, it's all just eye-balling the numbers. But as you correctly determined, either they didn't actually use a strand-specific protocol or they have a LOT of anti-sense transcription. In my opinion, if the ratio is worse than 0.9/0.1 and you don't expect a lot of anti-sense transcription then there are likely other major issues with the dataset. At that point you have to consider whether it makes sense to proceed at all with the analysis.

ADD COMMENT • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks for the answer.

Did you get something like this before?

This is PairEnd Data
Fraction of reads failed to determine: 0.9938
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0003
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0059
Finshed Checking Strand Specificty

Not sure what to make of this...

ADD REPLY • link 7.4 years ago by JJ ▴ 760

1

Entering edit mode

interesting...what organism is this? did you notice any other problems with this data set, e.g. after running FastQC?

QoRTs also has a script that checks strand-specificity, would be interesting to see what it spits out for those data sets.

ADD REPLY • link 7.4 years ago by Friederike 9.0k

0

Entering edit mode

No, I get something similar from an Encode bam file I downloaded.

Fraction of reads failed to determine: 0.7454
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0039
Fraction of reads explained by "1+-,1-+,2++,2--": 0.2506

https://www.encodeproject.org/experiments/ENCSR274JRR/ ENCFF978EVS.bam

thanks for the link to that tool! I'll try it out :)

oh, and it's human data

ADD REPLY • link 7.4 years ago by JJ ▴ 760

0

Entering edit mode

Could an insert size of 0 be an issue for the tool? This is the only communality I found between getting "weird" output and the different public datasets, although I don't know about the Encode one - I couldn't find an insert size for this one.

ADD REPLY • link 7.4 years ago by JJ ▴ 760

0

Entering edit mode

That's a first, I presume there's chromosome naming issue or something.

ADD REPLY • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

So I did

infer_experiment.py -r Homo_sapiens.GRCh38.79.bed -I ENCFF978EVS.bam

got this

Reading reference gene model ../../Homo_sapiens.GRCh38.79.bed ... Done
Loading SAM/BAM file ...  Total 200000 usable reads were sampled


This is PairEnd Data
Fraction of reads failed to determine: 0.7454
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0039
Fraction of reads explained by "1+-,1-+,2++,2--": 0.2506

ADD REPLY • link 7.4 years ago by JJ ▴ 760

0

Entering edit mode

Maybe the BAM file is aligned against hg19? Have a look at the lengths of a couple chromosomes.

ADD REPLY • link 7.4 years ago by Devon Ryan 105k

0

Entering edit mode

Well, I selected GRCh38 ... It's a different version though GRCh38 V24.

ADD REPLY • link 7.4 years ago by JJ ▴ 760

0

Entering edit mode

hm, I don't think so - I check this beforehand and edited the bed file...

samtools idxstats ENCFF978EVS.bam | cut -f 1 | head
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10

and

cat Homo_sapiens.GRCh38.79.bed | cut -f 1 | uniq
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrCHR_HG126_PATCH
chrCHR_HG1362_PATCH
chrCHR_HG142_HG150_NOVEL_TEST
chrCHR_HG151_NOVEL_TEST
chrCHR_HG1832_PATCH
......

ADD REPLY • link 7.4 years ago by JJ ▴ 760