Question: How to determine alternative splicing read counts
5
gravatar for punjabdhaputar
4.7 years ago by
punjabdhaputar50 wrote:

Hello,

I am attempting to target an alternative transcript of a gene and generate an expression level read count for this one specific gene. I have BAM files already aligned to Hg19. For the gene I have the genome location and the entire sequence. I have attempted the following:

Creating a Bed file with the start and end sequence of each of the alternative splice variants and the number of exons each one has. When utilizing bedtools and CoverageBed this results in identical read counts for each transcript due to the large overlap in chromosomal location.

I have additionally looked into http://miso.readthedocs.org/en/fastmiso/#gff-event-annotation Miso and the gff-alternative event annotation format. This seems like it may work though I do not understand how to create such a file. I have looked into the premade gff annotation files made my Miso http://miso.readthedocs.org/en/fastmiso/annotation.html but when I searched for the specific variant of my gene. It showed up in multiple entries, with none of them matching the chromosomal location of the gene I am looking for. I would really appreciate some help!!!

 

rna-seq • 10k views
ADD COMMENTlink modified 2.1 years ago by alorzadeh10 • written 4.7 years ago by punjabdhaputar50
1

Do you want raw counts or normalized expression values will do? If the latter is fine then you can use Cuffquant & Cuffnorm to get isoform expression of a gene. It will not give you raw read counts but will give normalized expression values & FPKM values. To get raw counts, DEXSeq may be your problem's solution.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by komal.rathi3.4k

The issue I am running into which both cuffquant and cuffnorm also have is how to create an annotation file, either BED or GFF because these file formats only require the chromosomal location of the gene. In my case the different alternative splice variants of this gene all overlap. So with only that information the reads between all the alternative splice variants will be identical. Is there a away around this?

ADD REPLYlink written 4.7 years ago by punjabdhaputar50
9
gravatar for EagleEye
4.7 years ago by
EagleEye6.2k
Sweden
EagleEye6.2k wrote:

Hello punjabdhaputar,

You could first try with the following pipeline. And please have a look at this paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3950716/

I) GESS: http://compbio.uthscsa.edu/GESS_Web/

II) MISO: http://genes.mit.edu/burgelab/miso/

 

===============================================================================================

I) General procedure:

1) Use GESS to get all expressed isoforms from your experiment (your aligned BAM file). You can find exon-skipping events for individual BAM files. If you want to check isoforms in the whole experimental condition, you can merge all BAM files from one condition and find the exon skipping events for corresponding conditions.

Output: GTF file with exon-skipping events.

2) Index the GTF files obtained from GESS for individual BAM or for each condition. 

3) Run MISO on indexed GTF files. This will give you results for individual genes and their score (Ψ calculation) for each of its isoform (or skipping events).

4) Plot results for individual gene (which plots all possible skipping events or isoforms) using sashimi plot from MISO. NOTE: Edit 'sashimi_plot_settings.txt' file to configure your samples and provide the path to 'sashimi_plot_settings.txt' in the command line.

Sample plot:

sashimi plot

===============================================================================================

II) Detailed steps to be followed:

1) python GESS.py -i SAMPLE1.bam -g /dbs/hg19/ -o SAMPLE1/GESS_out -n SAMPLE1 -l 5

2) /misopy/index_gff.py --index SAMPLE1/GESS_OUT/GESS_OUT.GTF SAMPLE1/indexed/

3)  python miso.py --run SAMPLE1/indexed/ SAMPLE1.bam --output-dir SAMPLE1/miso_out/ --read-len 35

4) python sashimi_plot.py --plot-event "ENSGxxxxxxxxxx.1" SAMPLE1/miso_out/ ./sashimi_plot_settings.txt --output-dir plots/

 

I hope this will help you finding the alternative splicing events. Please let me know if you need any further help.

 

-----------------------------------------------------------------------------------

References:

http://compbio.uthscsa.edu/GESS_Web/

http://miso.readthedocs.org/en/fastmiso/#preparing-the-alternative-isoforms-annotation

 

 

ADD COMMENTlink modified 4.3 years ago • written 4.7 years ago by EagleEye6.2k

Thank you very much for your reply. I will try this. I have a small question, if I already know what alternative splice I want to target including its sequence and location can I start at step two, and if so how can I create that GTF file?

ADD REPLYlink written 4.7 years ago by punjabdhaputar50

My suggestion is to better run all the above steps. And later you can check whether your alternative splice site has come as significant in the above analysis.

Or if you still want to run you can convert your alternative splice events (Gene with all its isoforms) as standard GTF format (http://www.ensembl.org/info/website/upload/gff.html#fields) and start with Step 2.

ADD REPLYlink modified 4.7 years ago • written 4.7 years ago by EagleEye6.2k

Hello, So I have gotten through steps 1-3 but cannot seem to get step 4 to work

I get this error when I run the command

"Exception: Cannot find file /home/ubuntu/SAMPLE1/miso_out/genes_to_filenames.shelve. Are you sure the events were indexed with the latest version of index_gff.py?"

When I move the genes_to_filenames.shelve file over to the miso_out folder I get the following error:

Exception: Event ENSGXXXXXXXXXX.1 not found in pickled directory /home/pranavubuntu/SAMPLE1/miso_out. Are you sure this is the right directory for the event?

(I actually plugged in the gene I am targeting in the Xs.) What is causing this error?

 

 

ADD REPLYlink written 4.7 years ago by punjabdhaputar50

I also tried to replace the ENSGXXXXXXXXXX with "chr9:32985969:32986028:-@chr9:32984629:32984855:-@chr9:32974456:32974559:-" a random splice I chose from the subfolders just to see if it worked and then I got this error: IOError: file `./test-data/bam-data/heartWT1.sorted.bam` not found. I never had a BAM file called this and I do not know where it is getting this name

 

ADD REPLYlink written 4.6 years ago by punjabdhaputar50
1

Can you please post your 'sashimi_plot_settings.txt' file? Because I guess you have not edited the file to configure your samples. Sorry I forget to mention that in the above steps.

This is './test-data/bam-data/heartWT1.sorted.bam' test datasets for sashimi plot.

ADD REPLYlink written 4.6 years ago by EagleEye6.2k

Hello I edited the sashimi_plot_settings.txt file but I still get this error

"Warning: MISO file  not found
Loading MISO file: 
Posterior plot failed.
Saving plot to: /home/ubuntu/plots/chr11:35226059:35226187:+@chr11:35227659|35227662:35227790:+.pdf"

 

 

Here is the entire sashimi_plot_settings.txt file that with the changes I made:

"[data]
# directory where BAM files are
bam_prefix = /media/ubuntu/LaCie1/Bams
# directory where MISO output is
miso_prefix = /home/ubuntu/SAMPLE1/miso_out

bam_files = [
  "test1.sorted_genome_alignments.bam"]

miso_files = [
    "chr10:255829:255988:+@chr10:267135:267296:+@chr10:282778:282855:+"]

[plotting]
# Dimensions of figure to be plotted (in inches)
fig_width = 7
fig_height = 5 
# Factor to scale down introns and exons by
intron_scale = 30
exon_scale = 4
# Whether to use a log scale or not when plotting
logged = False 
font_size = 6

bar_posteriors = False

# Max y-axis
ymax = 150

# Axis tick marks
nyticks = 3
nxticks = 4

# Whether to show axis labels
show_ylabel = True
show_xlabel = True

# Whether to plot posterior distributions inferred by MISO
show_posteriors = True 

# Whether to plot the number of reads in each junction
number_junctions = True

resolution = .5
posterior_bins = 40
gene_posterior_ratio = 5

# List of colors for read denisites of each sample
colors = [
    "#CC0011"]

# Number of mapped reads in each sample
# (Used to normalize the read density for RPKM calculation)
coverages = [
    6830944,]

# Bar color for Bayes factor distribution
# plots (--plot-bf-dist)
# Paint them blue
bar_color = "b"

# Bayes factors thresholds to use for --plot-bf-dist
bf_thresholds = [0, 1, 2, 5, 10, 20]

##
## Names of colors for plotting
##
# "b" for blue
# "k" for black
# "r" for red
# "g" for green
#
# Hex colors are accepted too.

 "

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by punjabdhaputar50

Try to change top #9 lines with these:

----------------------------------------------------------------------------------------------

[data]
# directory where BAM files are
bam_prefix = /media/ubuntu/LaCie1/Bams
# directory where MISO output is
miso_prefix = /home/ubuntu/SAMPLE1/miso_out

bam_files = [ "test1.sorted_genome_alignments.bam" ]

miso_files = [ "test1.sorted" ]

 

------------------------------------------------------------------

Please check whether you are providing proper event name as in the folder '/home/ubuntu/SAMPLE1/miso_out/chr**/'. And run this command with indexed GESS output

python sashimi_plot.py --plot-event "ENSGxxxxxxxxxx.1" /home/ubuntu/SAMPLE1/indexed ./sashimi_plot_settings.txt --output-dir /home/ubuntu/plots/

 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by EagleEye6.2k

None of my folders are of the form "ENSGxxxxxxxxxx.1" I am unsure whether GESS saved its indexed files as such. 

ADD REPLYlink written 4.6 years ago by punjabdhaputar50

GESS does not create any index files. You will get a GFF file as output from GESS, you have to use index_gff.py script to index. It will create individual files for each gene with '.pickle' extension 

And 'ENSGxxxxxxxxxx.1' is just a example I gave. It might have different name depends on the GFF file (may be named as chromosomal locations).

 

Output FORMATS you get:

GESS: .GFF file

MISO: chr[##]/[geneName].MISO (Individual folders for each chromosomes with geneName or chromosome location)

index_gff.py: indexed/chr[##]/[geneName].pickle

-----------------------------------------------------------------------------------

References:

http://compbio.uthscsa.edu/GESS_Web/

http://miso.readthedocs.org/en/fastmiso/#preparing-the-alternative-isoforms-annotation

 

ADD REPLYlink modified 4.6 years ago • written 4.6 years ago by EagleEye6.2k

Hi,

I'm trying out GESS on my set of data. They are mouse mm10. I used JAGuaR (Junction Alignments to Genome for RNA-Seq Reads) for alignment. I have installed all required packages. When I run GESS i get the following warning: "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Seq.py:341: BiopythonDeprecationWarning: This method is obsolete; please use str(my_seq) instead of my_seq.tostring(). BiopythonDeprecationWarning)"

And then I get the following error after Constructing SpliceGraph: Exon gap linking...:

"Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 122, in <module> main() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 64, in main global_data = GESS_patternScanning(options,global_data) #### File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 168, in GESS_patternScanning global_data = skipPatternProc.process() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 35, in process patterns = self.getSkipPattern(spliceGraph) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 74, in getSkipPattern chrid = component[0].split(':')[0] TypeError: 'set' object does not support indexing"

Do you have any idea what I am missing here?

Any help would be much appreciated.

Thank you,

Ali

ADD REPLYlink written 2.1 years ago by alorzadeh10
1
gravatar for alorzadeh
2.1 years ago by
alorzadeh10
alorzadeh10 wrote:

Hi,

I'm trying out GESS on my set of data. They are mouse mm10. I used JAGuaR (Junction Alignments to Genome for RNA-Seq Reads) for alignment. I have installed all required packages. When I run GESS i get the following warning: "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/Seq.py:341: BiopythonDeprecationWarning: This method is obsolete; please use str(my_seq) instead of my_seq.tostring(). BiopythonDeprecationWarning)"

And then I get the following error after Constructing SpliceGraph: Exon gap linking...:

"Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 122, in <module> main() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/GESS.py", line 64, in main global_data = GESS_patternScanning(options,global_data) #### File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 168, in GESS_patternScanning global_data = skipPatternProc.process() File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 35, in process patterns = self.getSkipPattern(spliceGraph) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/gess/impl/SkipPatternProc.py", line 74, in getSkipPattern chrid = component[0].split(':')[0] TypeError: 'set' object does not support indexing"

Do you have any idea what I am missing here?

Any help would be much appreciated.

Thank you,

Ali

ADD COMMENTlink written 2.1 years ago by alorzadeh10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 867 users visited in the last hour