How to determine alternative splicing read counts
2
5
Entering edit mode
8.1 years ago

Hello,

I am attempting to target an alternative transcript of a gene and generate an expression level read count for this one specific gene. I have BAM files already aligned to Hg19. For the gene I have the genome location and the entire sequence. I have attempted the following:

Creating a Bed file with the start and end sequence of each of the alternative splice variants and the number of exons each one has. When utilizing bedtools and CoverageBed this results in identical read counts for each transcript due to the large overlap in chromosomal location.

I have additionally looked into Miso and the gff-alternative event annotation format. This seems like it may work though I do not understand how to create such a file. I have looked into the premade gff annotation files made by Miso but when I searched for the specific variant of my gene. It showed up in multiple entries, with none of them matching the chromosomal location of the gene I am looking for. I would really appreciate some help!!!

RNA-Seq • 14k views
1
Entering edit mode

Do you want raw counts or normalized expression values will do? If the latter is fine then you can use Cuffquant & Cuffnorm to get isoform expression of a gene. It will not give you raw read counts but will give normalized expression values & FPKM values. To get raw counts, DEXSeq may be your problem's solution.

0
Entering edit mode

The issue I am running into which both cuffquant and cuffnorm also have is how to create an annotation file, either BED or GFF because these file formats only require the chromosomal location of the gene. In my case the different alternative splice variants of this gene all overlap. So with only that information the reads between all the alternative splice variants will be identical. Is there a away around this?

11
Entering edit mode
8.1 years ago
EagleEye 7.3k

Hello punjabdhaputar,

You could first try with the following pipeline. And please have a look at this paper

I) GESS
II) MISO

===============================================================================================

I) General procedure:

1. Use GESS to get all expressed isoforms from your experiment (your aligned BAM file). You can find exon-skipping events for individual BAM files. If you want to check isoforms in the whole experimental condition, you can merge all BAM files from one condition and find the exon skipping events for corresponding conditions.

Output: GTF file with exon-skipping events.

2. Index the GTF files obtained from GESS for individual BAM or for each condition.

3. Run MISO on indexed GTF files. This will give you results for individual genes and their score (Ψ calculation) for each of its isoform (or skipping events).

4. Plot results for individual gene (which plots all possible skipping events or isoforms) using sashimi plot from MISO. NOTE: Edit sashimi_plot_settings.txt file to configure your samples and provide the path to sashimi_plot_settings.txt in the command line.

Sample plot:

===============================================================================================

II) Detailed steps to be followed:

1. python GESS.py -i SAMPLE1.bam -g /dbs/hg19/ -o SAMPLE1/GESS_out -n SAMPLE1 -l 5
2. /misopy/index_gff.py --index SAMPLE1/GESS_OUT/GESS_OUT.GTF SAMPLE1/indexed/
3. python miso.py --run SAMPLE1/indexed/ SAMPLE1.bam --output-dir SAMPLE1/miso_out/ --read-len 35
4. python sashimi_plot.py --plot-event "ENSGxxxxxxxxxx.1" SAMPLE1/miso_out/ ./sashimi_plot_settings.txt --output-dir plots/

I hope this will help you finding the alternative splicing events. Please let me know if you need any further help.

References:

0
Entering edit mode

Thank you very much for your reply. I will try this. I have a small question, if I already know what alternative splice I want to target including its sequence and location can I start at step two, and if so how can I create that GTF file?

0
Entering edit mode

My suggestion is to better run all the above steps. And later you can check whether your alternative splice site has come as significant in the above analysis.

Or if you still want to run you can convert your alternative splice events (Gene with all its isoforms) as standard GTF format (http://www.ensembl.org/info/website/upload/gff.html#fields) and start with Step 2.

0
Entering edit mode

Hello, So I have gotten through steps 1-3 but cannot seem to get step 4 to work

I get this error when I run the command

Exception: Cannot find file /home/ubuntu/SAMPLE1/miso_out/genes_to_filenames.shelve. Are you sure the events were indexed with the latest version of index_gff.py?


When I move the genes_to_filenames.shelve file over to the miso_out folder I get the following error:

Exception: Event ENSGXXXXXXXXXX.1 not found in pickled directory /home/pranavubuntu/SAMPLE1/miso_out. Are you sure this is the right directory for the event?


(I actually plugged in the gene I am targeting in the Xs.) What is causing this error?

0
Entering edit mode

I also tried to replace the ENSGXXXXXXXXXX with chr9:32985969:32986028:-@chr9:32984629:32984855:-@chr9:32974456:32974559:- a random splice I chose from the subfolders just to see if it worked and then I got this error:

IOError: file ./test-data/bam-data/heartWT1.sorted.bam not found


I never had a BAM file called this and I do not know where it is getting this name

1
Entering edit mode

Can you please post your sashimi_plot_settings.txt file? Because I guess you have not edited the file to configure your samples. Sorry I forget to mention that in the above steps.

This is ./test-data/bam-data/heartWT1.sorted.bam test datasets for sashimi plot.

0
Entering edit mode

Hello I edited the sashimi_plot_settings.txt file but I still get this error

Warning: MISO file not found
Posterior plot failed.
Saving plot to: /home/ubuntu/plots/chr11:35226059:35226187:+@chr11:35227659|35227662:35227790:+.pdf


Here is the entire sashimi_plot_settings.txt file that with the changes I made:

[data]
# directory where BAM files are
bam_prefix = /media/ubuntu/LaCie1/Bams
# directory where MISO output is
miso_prefix = /home/ubuntu/SAMPLE1/miso_out

bam_files = [
"test1.sorted_genome_alignments.bam"]

miso_files = [
"chr10:255829:255988:+@chr10:267135:267296:+@chr10:282778:282855:+"]

[plotting]
# Dimensions of figure to be plotted (in inches)
fig_width = 7
fig_height = 5
# Factor to scale down introns and exons by
intron_scale = 30
exon_scale = 4
# Whether to use a log scale or not when plotting
logged = False
font_size = 6

bar_posteriors = False

# Max y-axis
ymax = 150

# Axis tick marks
nyticks = 3
nxticks = 4

# Whether to show axis labels
show_ylabel = True
show_xlabel = True

# Whether to plot posterior distributions inferred by MISO
show_posteriors = True

# Whether to plot the number of reads in each junction
number_junctions = True

resolution = .5
posterior_bins = 40
gene_posterior_ratio = 5

# List of colors for read denisites of each sample
colors = [
"#CC0011"]

# Number of mapped reads in each sample
# (Used to normalize the read density for RPKM calculation)
coverages = [
6830944,]

# Bar color for Bayes factor distribution
# plots (--plot-bf-dist)
# Paint them blue
bar_color = "b"

# Bayes factors thresholds to use for --plot-bf-dist
bf_thresholds = [0, 1, 2, 5, 10, 20]

##
## Names of colors for plotting
##
# "b" for blue
# "k" for black
# "r" for red
# "g" for green
#
# Hex colors are accepted too.

0
Entering edit mode

Try to change top #9 lines with these:

[data]
# directory where BAM files are
bam_prefix = /media/ubuntu/LaCie1/Bams
# directory where MISO output is
miso_prefix = /home/ubuntu/SAMPLE1/miso_out

bam_files = [ "test1.sorted_genome_alignments.bam" ]

miso_files = [ "test1.sorted" ]


Please check whether you are providing proper event name as in the folder /home/ubuntu/SAMPLE1/miso_out/chr**/. And run this command with indexed GESS output

python sashimi_plot.py --plot-event "**ENSGxxxxxxxxxx.1**" /home/ubuntu/SAMPLE1/**indexed **./sashimi_plot_settings.txt --output-dir /home/ubuntu/plots/

0
Entering edit mode

None of my folders are of the form ENSGxxxxxxxxxx.1. I am unsure whether GESS saved its indexed files as such.

0
Entering edit mode

GESS does not create any index files. You will get a GFF file as output from GESS, you have to use index_gff.py script to index. It will create individual files for each gene with .pickle extension

And ENSGxxxxxxxxxx.1 is just a example I gave. It might have different name depends on the GFF file (may be named as chromosomal locations).

Output FORMATS you get:

GESS: .GFF file

MISO: chr[##]/[geneName].MISO (Individual folders for each chromosomes with geneName or chromosome location)

index_gff.py: indexed/chr[##]/[geneName].pickle

References:

0
Entering edit mode

Hi,

I'm trying out GESS on my set of data. They are mouse mm10. I used JAGuaR (Junction Alignments to Genome for RNA-Seq Reads) for alignment. I have installed all required packages. When I run GESS I get the following warning:

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Seq.py:341: BiopythonDeprecationWarning: This method is obsolete; please use str(my_seq) instead of my_seq.tostring().
BiopythonDeprecationWarning)


And then I get the following error after Constructing SpliceGraph: Exon gap linking...:

"Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 122, in <module>
main()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 64, in main
global_data = GESS_patternScanning(options,global_data) ####
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 168, in GESS_patternScanning
global_data = skipPatternProc.process()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 35, in process
patterns = self.getSkipPattern(spliceGraph)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 74, in getSkipPattern
chrid = component[0].split(':')[0]
TypeError: 'set' object does not support indexing"


Do you have any idea what I am missing here?

Any help would be much appreciated.

Thank you,
Ali

1
Entering edit mode
5.5 years ago

Hi,

I'm trying out GESS on my set of data. They are mouse mm10.

I used JAGuaR (Junction Alignments to Genome for RNA-Seq Reads) for alignment.

I have installed all required packages.

When I run GESS I get the following warning:

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Seq.py:341: BiopythonDeprecationWarning: This method is obsolete; please use str(my_seq) instead of my_seq.tostring().
BiopythonDeprecationWarning)


And then I get the following error after Constructing SpliceGraph: Exon gap linking...:

Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 122, in <module>
main()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 64, in main
global_data = GESS_patternScanning(options,global_data) ####
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 168, in GESS_patternScanning
global_data = skipPatternProc.process()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 35, in process
patterns = self.getSkipPattern(spliceGraph)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 74, in getSkipPattern
chrid = component[0].split(':')[0]
TypeError: 'set' object does not support indexing


Do you have any idea what I am missing here?

Any help would be much appreciated.

Thank you,
Ali