Strandness of ENCODE RNA-seq data
0
0
Entering edit mode
2.4 years ago
husensofteng ▴ 380

I am trying to run stringTie on some ENCODE RNA-seq datasets but I am confused if the data is stranded or not.

For instance, in regards to this dataset: ENCSR000BYS the ENCODE web page states:

They are stranded PE76 Illumina GAIIx RNA-Seq libraries from rRNA-depleted Poly-A+ RNA > 200 nucleotides in size.

However, when I run infer_experiment.py on the BAM files I get the following result, which to my knowledge indicates unstranded library:

infer_experiment.py -i ENCFF309XGT.sortedByCoord.bam -r gencode.v31.primary_assembly.annotation_transcripts.bed -s 500000


Output:

This is PairEnd Data

Fraction of reads failed to determine: 0.0461

Fraction of reads explained by "1++,1--,2+-,2-+": 0.5633

Fraction of reads explained by "1+-,1-+,2++,2--": 0.3906

Any help is appreciated!

RNA-Seq ENCODE Assembly • 852 views
0
Entering edit mode

Where the bed file is coming from? You could try GUESSmyLT the result may be clearer

0
Entering edit mode

Thanks, the bed file is just extracted from the GENCODE gtf. I will give GUESSmyLT a try.

0
Entering edit mode

The Specific protocol for library ENCLB555AYX section contains a complete wetlab protocol, indeed seems to be unstranded. I mean, ENCODE is quite old, that is not really a surprise.

0
Entering edit mode

The issue is in that section they reference the paper that describes strand-specific sequencing. Thank you for the reply, I would assume it as unstranded.