Question: Run ASTALAVISTA for large number of files
0
gravatar for waqaskhokhar999
12 days ago by
waqaskhokhar99980 wrote:

I have mapped around 700 A. thaliana accessions using STAR and assembled them using Stringtie, Now I want to estimate alternative splicing events in each file, so I have used ASTALAVISTA, It gave me the output image illustrating different AS events and output gtff ile

Is it possible to do this using some sort of script, which takes input, process it using ASTALAVISTA and save output image and gtf file?

I have tried to process files using the standalone version of ASTALAVISTA but the number of AS events vary

Command used and output:

astalavista -t asta -i 763 -d 0 [INFO] Astalavista v4.0 (Flux Library:
1.30)

# started   Wed Oct 09 09:45:47 BST 2019
# CHR_SEQ   null
# EDGE_CONFIDENCE   127
# EVENTS    [ASI]
# EVENTS_ATR    []
# EVENTS_DIMENSION  0
# EVENTS_FILE   null
# INTRON_CONFIDENCE 127
# IN_FILE   763
# PAR_FILE  null
# TMP_DIR   /tmp    Checking GTF *[WARN] Unsorted in line 12 - cannot perform gene clustering: 1 - 763.4.1 @ 6788 after 763.3.1 @ 11672     sorting GTF file  OK (00:00:06) [WARN] Overwriting output file /media/waqas/waqas_third/AS_727/Assembly_first_new/central_asia_new/763_sorted_astalavista.gtf.gz.   Iterating Annotation ********** done. (00:00:03) [INFO] took 9 sec. [INFO] found 5877 events. AStalavista.

Second command:

astalavista -t asta -i 763 -e [ASE,ASI]
[INFO] Astalavista v4.0 (Flux Library: 1.30)

# started   Wed Oct 09 09:44:48 BST 2019
# CHR_SEQ   null
# EDGE_CONFIDENCE   127
# EVENTS    [ASE, ASI]
# EVENTS_ATR    []
# EVENTS_DIMENSION  2
# EVENTS_FILE   null
# INTRON_CONFIDENCE 127
# IN_FILE   763
# PAR_FILE  null
# TMP_DIR   /tmp
    Checking GTF *[WARN] Unsorted in line 12 - cannot perform gene clustering: 1 - 763.4.1 @ 6788 after 763.3.1 @ 11672
    sorting GTF file  OK (00:00:06)
[WARN] Overwriting output file /media/waqas/waqas_third/AS_727/Assembly_first_new/central_asia_new/763_sorted_astalavista.gtf.gz.
    Iterating Annotation ********** done. (00:00:03)
[INFO] took 9 sec.
[INFO] found 16528 events.
AStalavista.

image

ADD COMMENTlink modified 6 days ago • written 12 days ago by waqaskhokhar99980

Why would you want analyse the distribution in each file? To look for differences between groups?

ADD REPLYlink written 11 days ago by kristoffer.vittingseerup2.4k

These are geographically distributed accessions, so I am interested to see the number of alternative splicing events in each accession to see the pattern of alternative splicing among geographically spread accessions

ADD REPLYlink written 11 days ago by waqaskhokhar99980
1
gravatar for kristoffer.vittingseerup
6 days ago by
European Union
kristoffer.vittingseerup2.4k wrote:

I cannot answer the question about ASTALAVISTA - but I will try to answer the question about comparing alternative splicing from the geographically spread accessions (which I interprete as grouped data) via other tools as I think that might be a better solution for your problem.

The reason I would not use ASTALAVISTA (or other tools describing splicing in a single sample via the gtf / bam file) are:

  1. The majority of events between groups will be unchanged (either because they are not expressed or because the expression does not change between conditions). This means that the actual changes will be masked by the non-changes.
  2. A global analysis like that will not let you look into what is actually changing and will also ignore events masked by opposite changes (e.g. one gene loosing a intron retention while another gene gains an intron retention).

What I propose instead is that you use the quantification from each group to make a statistical analysis to identify events changing. This will let you look into exactly which genes are changing - but naturally you can also just sum them to get the genome wide overview. Such analysis of splicing can be done at 3 different levels: exon, splice event or transcript level. You can read more about each type of analysis as well as multiple suggested bioinformatic tools here.

Particularly I want to highlight the transcript level analysis (for full disclosure I am the author of the tool described next). The advantage of the transcript level analysis is that the biological interpretability becomes a lot easier since you can predict functional consequences (e.g. protein domain gain/loss) as well as analyse specific splicing patterns. A tool which can do this with the data you already have (StringTie quantifications) is the R package IsoformSwitchAnalyzeR - this section of the vignette illustrates the output of the IsoformSwitchAnalyzeR workflow. In addition to identification of isoform switches with functional consequences of individual genes we recently published an extension which perform genome-wide assessment of changes in splicing and consequences to identify systematic changes between groups (e.g. more frequent use of intron retention etc) which does not suffer from any of the limitations described above - you can find that article here and online you can find examples of the genome wide analysis of consequences and splicing.

ADD COMMENTlink modified 6 days ago • written 6 days ago by kristoffer.vittingseerup2.4k

@Kristoffer Many thanks for this information. I have processed around 700 A. thaliana accessions (without biological replicates), and interested to see patterns of alternative splicing among them, that's why I was using ASTALAVISTA to get the estimate of AS events in each accession, Beside this, these accessions were divided into 11 groups based on genetic similarity > 60. Can I use the same grouping mechanism to perform the differential AS analysis among these groups?

Or what could be the best way to perform AS analysis among these groups?

ADD REPLYlink written 1 day ago by waqaskhokhar99980
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1271 users visited in the last hour