Question: Is the RNA-seq library indeed strand-specific?
1
gravatar for JJ
2.3 years ago by
JJ510
JJ510 wrote:

Hi all,

So I am using RSeQC infer_experiment.py to check the strand-specificity of a public RNA-seq experiment. In the corresponding paper, the authors explicitly state that they have used the TruSeq™ Stranded Total RNA kit protocol. But this is what the output of RSeQC infer_experiment.py looks like:

This is PairEnd Data
Fraction of reads failed to determine: 0.0787
Fraction of reads explained by "1++,1--,2+-,2-+": 0.4686
Fraction of reads explained by "1+-,1-+,2++,2--": 0.4527
Finshed Checking Strand Specificty

So not strand-specific ... Another Run of the same experiment looks a bit better

This is PairEnd Data
Fraction of reads failed to determine: 0.0946
Fraction of reads explained by "1++,1--,2+-,2-+": 0.6052
Fraction of reads explained by "1+-,1-+,2++,2--": 0.3002
Finshed Checking Strand Specificty

Usually I get something like this:

This is PairEnd Data
Fraction of reads failed to determine: 0.0500
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0287
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9213
Finshed Checking Strand Specificty

So I suppose the stranded protocol did not really work for the data set in question... However, my question now is: Is there a guide for the fractions? what fraction is still ok for the data to be strand-specific? Is there a cutoff? I sometimes get for Fraction of reads explained by "1+-,1-+,2++,2--": 0.7 / 0.8 - is this still ok?

Thanks for your input!

rna-seq • 784 views
ADD COMMENTlink modified 2.3 years ago by Devon Ryan95k • written 2.3 years ago by JJ510
2
gravatar for Devon Ryan
2.3 years ago by
Devon Ryan95k
Freiburg, Germany
Devon Ryan95k wrote:

There's no strict guide for this, it's all just eye-balling the numbers. But as you correctly determined, either they didn't actually use a strand-specific protocol or they have a LOT of anti-sense transcription. In my opinion, if the ratio is worse than 0.9/0.1 and you don't expect a lot of anti-sense transcription then there are likely other major issues with the dataset. At that point you have to consider whether it makes sense to proceed at all with the analysis.

ADD COMMENTlink written 2.3 years ago by Devon Ryan95k

Thanks for the answer.

Did you get something like this before?

This is PairEnd Data
Fraction of reads failed to determine: 0.9938
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0003
Fraction of reads explained by "1+-,1-+,2++,2--": 0.0059
Finshed Checking Strand Specificty

Not sure what to make of this...

ADD REPLYlink written 2.3 years ago by JJ510

interesting...what organism is this? did you notice any other problems with this data set, e.g. after running FastQC?

QoRTs also has a script that checks strand-specificity, would be interesting to see what it spits out for those data sets.

ADD REPLYlink written 2.3 years ago by Friederike5.7k

No, I get something similar from an Encode bam file I downloaded.

Fraction of reads failed to determine: 0.7454
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0039
Fraction of reads explained by "1+-,1-+,2++,2--": 0.2506

https://www.encodeproject.org/experiments/ENCSR274JRR/ ENCFF978EVS.bam

thanks for the link to that tool! I'll try it out :)

oh, and it's human data

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by JJ510

Could an insert size of 0 be an issue for the tool? This is the only communality I found between getting "weird" output and the different public datasets, although I don't know about the Encode one - I couldn't find an insert size for this one.

ADD REPLYlink written 2.3 years ago by JJ510

That's a first, I presume there's chromosome naming issue or something.

ADD REPLYlink written 2.3 years ago by Devon Ryan95k

So I did

infer_experiment.py -r Homo_sapiens.GRCh38.79.bed -I ENCFF978EVS.bam

got this

Reading reference gene model ../../Homo_sapiens.GRCh38.79.bed ... Done
Loading SAM/BAM file ...  Total 200000 usable reads were sampled


This is PairEnd Data
Fraction of reads failed to determine: 0.7454
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0039
Fraction of reads explained by "1+-,1-+,2++,2--": 0.2506
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by JJ510

Maybe the BAM file is aligned against hg19? Have a look at the lengths of a couple chromosomes.

ADD REPLYlink written 2.3 years ago by Devon Ryan95k

Well, I selected GRCh38 ... It's a different version though GRCh38 V24.

ADD REPLYlink written 2.3 years ago by JJ510

hm, I don't think so - I check this beforehand and edited the bed file...

samtools idxstats ENCFF978EVS.bam | cut -f 1 | head
chr1
chr2
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chr10

and

cat Homo_sapiens.GRCh38.79.bed | cut -f 1 | uniq
chr1
chr10
chr11
chr12
chr13
chr14
chr15
chr16
chr17
chr18
chr19
chr2
chr20
chr21
chr22
chr3
chr4
chr5
chr6
chr7
chr8
chr9
chrCHR_HG126_PATCH
chrCHR_HG1362_PATCH
chrCHR_HG142_HG150_NOVEL_TEST
chrCHR_HG151_NOVEL_TEST
chrCHR_HG1832_PATCH
......
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by JJ510
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 876 users visited in the last hour