Run ASTALAVISTA for large number of files
1
0
Entering edit mode
2.3 years ago

I have mapped around 700 A. thaliana accessions using STAR and assembled them using Stringtie, Now I want to estimate alternative splicing events in each file, so I have used ASTALAVISTA, It gave me the output image illustrating different AS events and output gtff ile

Is it possible to do this using some sort of script, which takes input, process it using ASTALAVISTA and save output image and gtf file?

I have tried to process files using the standalone version of ASTALAVISTA but the number of AS events vary

Command used and output:

astalavista -t asta -i 763 -d 0 [INFO] Astalavista v4.0 (Flux Library:
1.30)

# started   Wed Oct 09 09:45:47 BST 2019
# CHR_SEQ   null
# EDGE_CONFIDENCE   127
# EVENTS    [ASI]
# EVENTS_ATR    []
# EVENTS_DIMENSION  0
# EVENTS_FILE   null
# INTRON_CONFIDENCE 127
# IN_FILE   763
# PAR_FILE  null
# TMP_DIR   /tmp    Checking GTF *[WARN] Unsorted in line 12 - cannot perform gene clustering: 1 - 763.4.1 @ 6788 after 763.3.1 @ 11672     sorting GTF file  OK (00:00:06) [WARN] Overwriting output file /media/waqas/waqas_third/AS_727/Assembly_first_new/central_asia_new/763_sorted_astalavista.gtf.gz.   Iterating Annotation ********** done. (00:00:03) [INFO] took 9 sec. [INFO] found 5877 events. AStalavista.


Second command:

astalavista -t asta -i 763 -e [ASE,ASI]
[INFO] Astalavista v4.0 (Flux Library: 1.30)

# started   Wed Oct 09 09:44:48 BST 2019
# CHR_SEQ   null
# EDGE_CONFIDENCE   127
# EVENTS    [ASE, ASI]
# EVENTS_ATR    []
# EVENTS_DIMENSION  2
# EVENTS_FILE   null
# INTRON_CONFIDENCE 127
# IN_FILE   763
# PAR_FILE  null
# TMP_DIR   /tmp
Checking GTF *[WARN] Unsorted in line 12 - cannot perform gene clustering: 1 - 763.4.1 @ 6788 after 763.3.1 @ 11672
sorting GTF file  OK (00:00:06)
[WARN] Overwriting output file /media/waqas/waqas_third/AS_727/Assembly_first_new/central_asia_new/763_sorted_astalavista.gtf.gz.
Iterating Annotation ********** done. (00:00:03)
[INFO] took 9 sec.
[INFO] found 16528 events.
AStalavista.


RNA-Seq alternative splicing • 1.0k views
0
Entering edit mode

Why would you want analyse the distribution in each file? To look for differences between groups?

0
Entering edit mode

These are geographically distributed accessions, so I am interested to see the number of alternative splicing events in each accession to see the pattern of alternative splicing among geographically spread accessions

1
Entering edit mode
2.3 years ago

I cannot answer the question about ASTALAVISTA - but I will try to answer the question about comparing alternative splicing from the geographically spread accessions (which I interprete as grouped data) via other tools as I think that might be a better solution for your problem.

The reason I would not use ASTALAVISTA (or other tools describing splicing in a single sample via the gtf / bam file) are:

1. The majority of events between groups will be unchanged (either because they are not expressed or because the expression does not change between conditions). This means that the actual changes will be masked by the non-changes.
2. A global analysis like that will not let you look into what is actually changing and will also ignore events masked by opposite changes (e.g. one gene loosing a intron retention while another gene gains an intron retention).

What I propose instead is that you use the quantification from each group to make a statistical analysis to identify events changing. This will let you look into exactly which genes are changing - but naturally you can also just sum them to get the genome wide overview. Such analysis of splicing can be done at 3 different levels: exon, splice event or transcript level. You can read more about each type of analysis as well as multiple suggested bioinformatic tools here.

Particularly I want to highlight the transcript level analysis (for full disclosure I am the author of the tool described next). The advantage of the transcript level analysis is that the biological interpretability becomes a lot easier since you can predict functional consequences (e.g. protein domain gain/loss) as well as analyse specific splicing patterns. A tool which can do this with the data you already have (StringTie quantifications) is the R package IsoformSwitchAnalyzeR - this section of the vignette illustrates the output of the IsoformSwitchAnalyzeR workflow. In addition to identification of isoform switches with functional consequences of individual genes we recently published an extension which perform genome-wide assessment of changes in splicing and consequences to identify systematic changes between groups (e.g. more frequent use of intron retention etc) which does not suffer from any of the limitations described above - you can find that article here and online you can find examples of the genome wide analysis of consequences and splicing.

0
Entering edit mode

@Kristoffer Many thanks for this information. I have processed around 700 A. thaliana accessions (without biological replicates), and interested to see patterns of alternative splicing among them, that's why I was using ASTALAVISTA to get the estimate of AS events in each accession, Beside this, these accessions were divided into 11 groups based on genetic similarity > 60. Can I use the same grouping mechanism to perform the differential AS analysis among these groups?

Or what could be the best way to perform AS analysis among these groups?

1
Entering edit mode

I think using accessions and genetic similarity groups and compare them sounds like an excellent idea (and that would solve the two problems describe above). Btw IsoformSwitchAnalyzeR can help you do this.

0
Entering edit mode

@Kristoffer many thanks for update, I have read IsoformSwitchAnalyzeR tutorial, in which its mentioned that we must have biological replicates (IsoformSwitchAnalyzeR requires independent biological replicates) but I don't have biological replicates, so I will consider accessions within a group as biological replicates. The number of accessions in each group varies, for eample one group contains 50 accessions while other contains 100 and some other contains 78, so is it fine to perform differntial AS by comparing these groups regardless of numner of accessions in each group?

0
Entering edit mode

In this case consider accessions within a group as biological replicates is a good (and valid) approach. And having different number of samples is not so much of a problem - you can read more here.