Question

Splice junction sites and number of splitted reads from STAR aligned bam

0

Entering edit mode

5.8 years ago

filippo.martignano ▴ 20

Hi everyone!

I'm working with some RNA-seq bams aligned with STAR. What I want to do is to detect splicing sites (both known and unknown) along with the coverage information (basically how many reads are spliced at a given position)

Basically I'm looking for a tool that does exactly what "sashimi plot" does in IGV, but wthout a grafic interface, in order to deal with a large amount of data.

Any suggestions?

Thanks!

RNA-Seq sashimi plot splice junctions • 3.3k views

ADD COMMENT • link updated 4.9 years ago by phaedragius • 0 • written 5.8 years ago by filippo.martignano ▴ 20

1

Entering edit mode

Too many. While it's old, this should get you started. Good luck!

Best Approach To Predict Novel And Alternative Splicing Events From Rna-Seq Data

ADD REPLY • link 5.8 years ago by Eric Lim ★ 2.1k

0

Entering edit mode

Thank you very much Eric! I did alread know about that thread, anyway it's a bit a "too much information" situation for me. I explain better: I'm looking for something as "raw" as possible (as I said, basically only the raw number of splitted read at a given position), all the tools I've checked out since now are designed to infer isoform expression, or to estimate possible isoform structures...obviously that means that there are some filtering criteria that I don't want as I am looking for raw data. I can check out every software suggested in other thread but it will take ages to find one that is suitable to my uncommon purposes. That's why I'm asking here...can anyone suggest a software that does exactly what I'm looking for without any "fancy" statistical filtering step?

thanks.

ADD REPLY • link 5.8 years ago by filippo.martignano ▴ 20

0

Entering edit mode

rMATS (http://rnaseq-mats.sourceforge.net/rmats4.0.1/) is particularly popular among biostars members. I recently played around with SGSeq (https://bioconductor.org/packages/3.7/bioc/vignettes/SGSeq/inst/doc/SGSeq.html) and thought it was pretty good, but it was a bit slow with well-covered sequencing data. There are at least a dozen more tools that were developed to specifically address that. At Stoke, we ended up developing an internal version ourselves in order to stay as comprehensive as we can.

ADD REPLY • link 5.8 years ago by Eric Lim ★ 2.1k

score 5 · Answer 1 · 2018-06-20

Hi, I can recommend the R package "spliceSites" to you and I'm not the developer.

It transforms Bam files into one big gap table, which basically is a large table with one row representing a gap in the alignment of the reads, which is represented by at least one read. Among others, it states the coordinates of the splice sites and the amount of reads, having the respective gap in their alignment. After annotating the gap table, it also tells you, if the splice donor or acceptor is annotated in the gtf file you use for the mapping and if not, it states the distance of the respective splice site to the next annotated splice site, which would be appropriate.

I'm studing splicing and after data preparation, I basically only work with these gap tables, since they describe quiet nicely, the read coverage of splice sites and the usage of not annotated ones.

You can also use multiple bam files to create one gap table, making it possible to compare the usage of a splice site across samples. The read coverage of splice sites is stated in total reads and in rpmg, which represents the read coverage, normalized to the total read count in one bam file, or one bam file group.

I have to mention the 2-pass run of the STAR aligner, in case you don't know about it already, since it took me a while until i learned about it. Apparently, the 2-pass run with STAR is recommended by the STAR developer Alexander Dobin for analysis concerning splice site usage, esspecially anaylsis of not-annotated splice site usage.

So, if you are using R, I would definitly give the package "spliceSites" a try, in particular the function readTabledBamGaps, since it gives you a very raw output, describing splice site usage in your data.

Edited some gramma

score 1 · Answer 2 · 2018-06-20

1

Entering edit mode

5.8 years ago

trausch ★ 1.9k

Alfred should work

alfred count_jct -g Homo_spaiens.GRCh37.75.gtf.gz input.star.bam

ADD COMMENT • link 5.8 years ago by trausch ★ 1.9k

score 0 · Answer 3 · 2019-06-10

Check out our new tool called SCANVIS which generates static sashimi plots for multiple samples as you desire. Preprint available on biorxiv and software is on GitHub. Stay tuned in to bioconductor where the software will soon be available. And stay tuned in to Bioinformatics journal where a revamped cleaner version of our paper (thanks to the awesome reviewers) will appear soon