Question

Infer strandedness of my single-end RNA-Seq data

0

Entering edit mode

4.0 years ago

ATCG ▴ 380

Hi, I'm using infer_experiment.py to infer strandedness of my single-end RNA-Seq data

For Alignment using HISAT2 I used the genome ftp://ftp.ensembl.org/pub/release99/fasta/drosophila_melanogaster/dna/Drosophila_melanogaster.BDGP6.28.dna.toplevel.fa.gz

To use infer_experiment.py would I convert this genome.fa file to bed??

Or the GTF file corresponding to the .fa?

ftp://ftp.ensembl.org/pub/release-99/gtf/drosophila_melanogaster/Drosophila_melanogaster.BDGP6.28.99.chr.gtf.gz

I tried using this bed file from the

https://sourceforge.net/projects/rseqc/files/BED/fly_D.melanogaster/dm6_Ensembl.bed.gz

but I get the following error:

infer_experiment.py --input-file=WT_1.bam --refgene=dm6_Ensembl.bed --sample-size=200000 --mapq=30

Reading reference gene model dm6_Ensembl.bed ... Done Loading SAM/BAM file ... Finished Total 0 usable reads were sampled Unknown Data type

I have read other post on this topic but I still don't know how to solve this error

RNA-Seq HISAT2 strandedness • 1.1k views

ADD COMMENT • link updated 4.0 years ago by Kristoffer Vitting-Seerup ★ 4.0k • written 4.0 years ago by ATCG ▴ 380

score 1 · Answer 1 · 2020-05-01

I do not know about infer_experiment.py but since your question title is broad I'll mention an other approach. You can quantify known genes using a strand specific tool - then invert the strandness of the known annotation and do it again. By (for each gene) comparing how large a fraction read maps to the standard vs the inverted you can get an answer.