Strandness of ENCODE RNA-seq data
0
0
Entering edit mode
3.5 years ago
husensofteng ▴ 410

I am trying to run stringTie on some ENCODE RNA-seq datasets but I am confused if the data is stranded or not.

For instance, in regards to this dataset: ENCSR000BYS the ENCODE web page states:

They are stranded PE76 Illumina GAIIx RNA-Seq libraries from rRNA-depleted Poly-A+ RNA > 200 nucleotides in size.

However, when I run infer_experiment.py on the BAM files I get the following result, which to my knowledge indicates unstranded library:

infer_experiment.py -i ENCFF309XGT.sortedByCoord.bam -r gencode.v31.primary_assembly.annotation_transcripts.bed -s 500000

Output:

Loading SAM/BAM file ... Total 500000 usable reads were sampled

This is PairEnd Data

Fraction of reads failed to determine: 0.0461

Fraction of reads explained by "1++,1--,2+-,2-+": 0.5633

Fraction of reads explained by "1+-,1-+,2++,2--": 0.3906

Any help is appreciated!

RNA-Seq ENCODE Assembly • 1.1k views
ADD COMMENT
0
Entering edit mode

Where the bed file is coming from? You could try GUESSmyLT the result may be clearer

ADD REPLY
0
Entering edit mode

Thanks, the bed file is just extracted from the GENCODE gtf. I will give GUESSmyLT a try.

ADD REPLY
0
Entering edit mode

The Specific protocol for library ENCLB555AYX section contains a complete wetlab protocol, indeed seems to be unstranded. I mean, ENCODE is quite old, that is not really a surprise.

ADD REPLY
0
Entering edit mode

The issue is in that section they reference the paper that describes strand-specific sequencing. Thank you for the reply, I would assume it as unstranded.

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6