Question: junction_annotation.py: How many 'novel' splice junctions/splice events are resonably expected from human RNA,
0
gravatar for RNAseqer
9 months ago by
RNAseqer 110
RNAseqer 110 wrote:

Hello all,

I was just wondering what a reasonable percentage of 'novel' splice junctions/splice events is for human RNAseq data using the program junction_annotation.py. I am new to RNAseq and just running some published human RNAseq data through my pipeline in order to familiarize myself with the programs and protocols. When I performed this splice junction analysis I got what was to me an eyebrow raising estimate of novel splice junctions/events:

Splicing junctions: - Complete Novel = 62% - Partial novel =5% - Annotated 34%

Splicing events - Complete Novel =17% - partial novel=1% - known =81%

Should I be worried about that 62% complete novel splice junction estimate?

If you are interested, here is what I've done:

I am using 104 bp paired end reads off of avg. 250bp fragments (distribution of inner distances has stdev of 50).

From a GTF file Homo_sapiens.GRCh38.95.gtf.gz I created a bed file using the following command line:

$ awk '{if($3 != "gene") print $0}' homo_sapiens_grch38.95_chameleon_cleaned.gtf | grep -v "^#" | gtfToGenePred /dev/stdin /dev/stdout | genePredToBed stdin Homo_sapiens.GRCh38.95.bed

While my bam file was generated from a HISAT2 .sam output using the command line:

samtools view -bS testoutput3.sam | samtools sort -o testoutput3.bam
enter code here

Using the program junction_annotation.py with the following command line:

$ junction_annotation.py -r Homo_sapiens.GRCh38.95.bed -i testoutput3.bam -o out

I got the following output:

    Reading reference bed file:  Homo_sapiens.GRCh38.95.bed  ...  Done
Load BAM file ...  Done
total = 14081359

===================================================================
Total splicing  Events: 14081359
Known Splicing Events:  11341230
Partial Novel Splicing Events:  99514
Novel Splicing Events:  2348855

Total splicing  Junctions:  441831
Known Splicing Junctions:   148196
Partial Novel Splicing Junctions:   21482
Novel Splicing Junctions:   272153

===================================================================
null device 
          1 
null device 
          1

Many thanks for any advice/input/help you can give!

ADD COMMENTlink written 9 months ago by RNAseqer 110

Double-check the version that you are using. Note the release notes:

RSeQC v2.6.1
Fix bug in “junction_annotation.py” in that it would report some “novel splice junctions” that don’t exist in the BAM files. This happened when reads were clipped and spliced mapped simultaneously.

[source: http://rseqc.sourceforge.net/]

ADD REPLYlink written 9 months ago by Kevin Blighe51k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1819 users visited in the last hour