I am trying to run stringTie on some ENCODE RNA-seq datasets but I am confused if the data is stranded or not.
For instance, in regards to this dataset: ENCSR000BYS the ENCODE web page states:
They are stranded PE76 Illumina GAIIx RNA-Seq libraries from rRNA-depleted Poly-A+ RNA > 200 nucleotides in size.
However, when I run infer_experiment.py on the BAM files I get the following result, which to my knowledge indicates unstranded library:
infer_experiment.py -i ENCFF309XGT.sortedByCoord.bam -r gencode.v31.primary_assembly.annotation_transcripts.bed -s 500000
Output:
Loading SAM/BAM file ... Total 500000 usable reads were sampled
This is PairEnd Data
Fraction of reads failed to determine: 0.0461
Fraction of reads explained by "1++,1--,2+-,2-+": 0.5633
Fraction of reads explained by "1+-,1-+,2++,2--": 0.3906
Any help is appreciated!
Where the bed file is coming from? You could try GUESSmyLT the result may be clearer
Thanks, the
bedfile is just extracted from the GENCODEgtf. I will giveGUESSmyLTa try.The
Specific protocol for library ENCLB555AYXsection contains a complete wetlab protocol, indeed seems to be unstranded. I mean, ENCODE is quite old, that is not really a surprise.The issue is in that section they reference the paper that describes strand-specific sequencing. Thank you for the reply, I would assume it as unstranded.