Question: Strandness of ENCODE RNA-seq data
0
gravatar for husensofteng
5 weeks ago by
husensofteng290
Sweden
husensofteng290 wrote:

I am trying to run stringTie on some ENCODE RNA-seq datasets but I am confused if the data is stranded or not.

For instance, in regards to this dataset: ENCSR000BYS the ENCODE web page states:

They are stranded PE76 Illumina GAIIx RNA-Seq libraries from rRNA-depleted Poly-A+ RNA > 200 nucleotides in size.

However, when I run infer_experiment.py on the BAM files I get the following result, which to my knowledge indicates unstranded library:

infer_experiment.py -i ENCFF309XGT.sortedByCoord.bam -r gencode.v31.primary_assembly.annotation_transcripts.bed -s 500000

Output:

Loading SAM/BAM file ... Total 500000 usable reads were sampled

This is PairEnd Data

Fraction of reads failed to determine: 0.0461

Fraction of reads explained by "1++,1--,2+-,2-+": 0.5633

Fraction of reads explained by "1+-,1-+,2++,2--": 0.3906

Any help is appreciated!

rna-seq encode assembly • 112 views
ADD COMMENTlink written 5 weeks ago by husensofteng290

Where the bed file is coming from? You could try GUESSmyLT the result may be clearer

ADD REPLYlink written 5 weeks ago by Juke344.9k

Thanks, the bed file is just extracted from the GENCODE gtf. I will give GUESSmyLT a try.

ADD REPLYlink written 5 weeks ago by husensofteng290

The Specific protocol for library ENCLB555AYX section contains a complete wetlab protocol, indeed seems to be unstranded. I mean, ENCODE is quite old, that is not really a surprise.

ADD REPLYlink written 5 weeks ago by ATpoint42k

The issue is in that section they reference the paper that describes strand-specific sequencing. Thank you for the reply, I would assume it as unstranded.

ADD REPLYlink written 5 weeks ago by husensofteng290
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1802 users visited in the last hour