Question

Run ASTALAVISTA for large number of files

0

Entering edit mode

4.5 years ago

waqaskhokhar999 ▴ 160

I have mapped around 700 A. thaliana accessions using STAR and assembled them using Stringtie, Now I want to estimate alternative splicing events in each file, so I have used ASTALAVISTA, It gave me the output image illustrating different AS events and output gtff ile

Is it possible to do this using some sort of script, which takes input, process it using ASTALAVISTA and save output image and gtf file?

I have tried to process files using the standalone version of ASTALAVISTA but the number of AS events vary

Command used and output:

astalavista -t asta -i 763 -d 0 [INFO] Astalavista v4.0 (Flux Library:
1.30)

# started   Wed Oct 09 09:45:47 BST 2019
# CHR_SEQ   null
# EDGE_CONFIDENCE   127
# EVENTS    [ASI]
# EVENTS_ATR    []
# EVENTS_DIMENSION  0
# EVENTS_FILE   null
# INTRON_CONFIDENCE 127
# IN_FILE   763
# PAR_FILE  null
# TMP_DIR   /tmp    Checking GTF *[WARN] Unsorted in line 12 - cannot perform gene clustering: 1 - 763.4.1 @ 6788 after 763.3.1 @ 11672     sorting GTF file  OK (00:00:06) [WARN] Overwriting output file /media/waqas/waqas_third/AS_727/Assembly_first_new/central_asia_new/763_sorted_astalavista.gtf.gz.   Iterating Annotation ********** done. (00:00:03) [INFO] took 9 sec. [INFO] found 5877 events. AStalavista.

Second command:

astalavista -t asta -i 763 -e [ASE,ASI]
[INFO] Astalavista v4.0 (Flux Library: 1.30)

# started   Wed Oct 09 09:44:48 BST 2019
# CHR_SEQ   null
# EDGE_CONFIDENCE   127
# EVENTS    [ASE, ASI]
# EVENTS_ATR    []
# EVENTS_DIMENSION  2
# EVENTS_FILE   null
# INTRON_CONFIDENCE 127
# IN_FILE   763
# PAR_FILE  null
# TMP_DIR   /tmp
    Checking GTF *[WARN] Unsorted in line 12 - cannot perform gene clustering: 1 - 763.4.1 @ 6788 after 763.3.1 @ 11672
    sorting GTF file  OK (00:00:06)
[WARN] Overwriting output file /media/waqas/waqas_third/AS_727/Assembly_first_new/central_asia_new/763_sorted_astalavista.gtf.gz.
    Iterating Annotation ********** done. (00:00:03)
[INFO] took 9 sec.
[INFO] found 16528 events.
AStalavista.

RNA-Seq alternative splicing • 2.2k views

ADD COMMENT • link updated 17 months ago by lei • 0 • written 4.5 years ago by waqaskhokhar999 ▴ 160

0

Entering edit mode

Why would you want analyse the distribution in each file? To look for differences between groups?

ADD REPLY • link 4.5 years ago by Kristoffer Vitting-Seerup ★ 4.0k

0

Entering edit mode

These are geographically distributed accessions, so I am interested to see the number of alternative splicing events in each accession to see the pattern of alternative splicing among geographically spread accessions

ADD REPLY • link 4.5 years ago by waqaskhokhar999 ▴ 160

score 1 · Answer 1 · 2019-10-15

I cannot answer the question about ASTALAVISTA - but I will try to answer the question about comparing alternative splicing from the geographically spread accessions (which I interprete as grouped data) via other tools as I think that might be a better solution for your problem.

The reason I would not use ASTALAVISTA (or other tools describing splicing in a single sample via the gtf / bam file) are:

The majority of events between groups will be unchanged (either because they are not expressed or because the expression does not change between conditions). This means that the actual changes will be masked by the non-changes.
A global analysis like that will not let you look into what is actually changing and will also ignore events masked by opposite changes (e.g. one gene loosing a intron retention while another gene gains an intron retention).

What I propose instead is that you use the quantification from each group to make a statistical analysis to identify events changing. This will let you look into exactly which genes are changing - but naturally you can also just sum them to get the genome wide overview. Such analysis of splicing can be done at 3 different levels: exon, splice event or transcript level. You can read more about each type of analysis as well as multiple suggested bioinformatic tools here.

Particularly I want to highlight the transcript level analysis (for full disclosure I am the author of the tool described next). The advantage of the transcript level analysis is that the biological interpretability becomes a lot easier since you can predict functional consequences (e.g. protein domain gain/loss) as well as analyse specific splicing patterns. A tool which can do this with the data you already have (StringTie quantifications) is the R package IsoformSwitchAnalyzeR - this section of the vignette illustrates the output of the IsoformSwitchAnalyzeR workflow. In addition to identification of isoform switches with functional consequences of individual genes we recently published an extension which perform genome-wide assessment of changes in splicing and consequences to identify systematic changes between groups (e.g. more frequent use of intron retention etc) which does not suffer from any of the limitations described above - you can find that article here and online you can find examples of the genome wide analysis of consequences and splicing.

score 0 · Answer 2 · 2022-11-17

0

Entering edit mode

17 months ago

lei • 0

Excuse me ,could I see what your output file look like?

ADD COMMENT • link 17 months ago by lei • 0