junction_annotation.py: How many 'novel' splice junctions/splice events are resonably expected from human RNA,
2.2 years ago
RNAseqer ▴ 150

Hello all,

I was just wondering what a reasonable percentage of 'novel' splice junctions/splice events is for human RNAseq data using the program junction_annotation.py. I am new to RNAseq and just running some published human RNAseq data through my pipeline in order to familiarize myself with the programs and protocols. When I performed this splice junction analysis I got what was to me an eyebrow raising estimate of novel splice junctions/events:

Splicing junctions: - Complete Novel = 62% - Partial novel =5% - Annotated 34%

Splicing events - Complete Novel =17% - partial novel=1% - known =81%

Should I be worried about that 62% complete novel splice junction estimate?

If you are interested, here is what I've done:

I am using 104 bp paired end reads off of avg. 250bp fragments (distribution of inner distances has stdev of 50).

From a GTF file Homo_sapiens.GRCh38.95.gtf.gz I created a bed file using the following command line:

$ awk '{if($3 != "gene") print $0}' homo_sapiens_grch38.95_chameleon_cleaned.gtf | grep -v "^#" | gtfToGenePred /dev/stdin /dev/stdout | genePredToBed stdin Homo_sapiens.GRCh38.95.bed

While my bam file was generated from a HISAT2 .sam output using the command line:

samtools view -bS testoutput3.sam | samtools sort -o testoutput3.bam
Using the program junction_annotation.py with the following command line:

$ junction_annotation.py -r Homo_sapiens.GRCh38.95.bed -i testoutput3.bam -o out

I got the following output:

    Reading reference bed file:  Homo_sapiens.GRCh38.95.bed  ...  Done
Load BAM file ...  Done
total = 14081359

Total splicing  Events: 14081359
Known Splicing Events:  11341230
Partial Novel Splicing Events:  99514
Novel Splicing Events:  2348855

Total splicing  Junctions:  441831
Known Splicing Junctions:   148196
Partial Novel Splicing Junctions:   21482
Novel Splicing Junctions:   272153

Many thanks for any advice/input/help you can give!

Double-check the version that you are using. Note the release notes:

RSeQC v2.6.1
Fix bug in “junction_annotation.py” in that it would report some “novel splice junctions” that don’t exist in the BAM files. This happened when reads were clipped and spliced mapped simultaneously.

[source: http://rseqc.sourceforge.net/]


