Question

Can bowtie detect exon-intron splicing?

1

Entering edit mode

11.0 years ago

biolab ★ 1.4k

Dear all,

I am a new user of bowtie2. I have two questions that need some of your help.

I used bowtie2 to map reads on genome and generate a SAM file. Can this SAM file indicate exon-intron splicing? In other words, can SAM file include exon-intron junction information?
How to convert SAM file to GFF file, which can be readily used in Genome Browser?

I much appreciate your help. Thanks for any of your comments!

bowtie2 • 5.5k views

ADD COMMENT • link updated 3.7 years ago by Ram 45k • written 11.0 years ago by biolab ★ 1.4k

Ram · Answer 1 · 2014-11-19

1

Entering edit mode

11.0 years ago

GouthamAtla 12k

Bowtie2 will not distinguish between exons and introns as it will not look at annotations while aligning the reads.

To view your alignment in genome browser, convert SAM to BAM file --> sort --> index. You can also load a BED file into genome browser.

If you have RNA-SEQ data, use tophat2 for exon-intron spliced mapping.

ADD COMMENT • link 11.0 years ago by GouthamAtla 12k

0

Entering edit mode

Geek_y, thanks a lot for your reply. However, I have one further question: if a read (e.g. 100 bp) spans two genomic regions, bowtie can map the two segments on genome (I do check bowtie can do this), so this is indicative of splicing, and the positions of exon-intron can be identified. Thus bowtie can find splicing site, am I right?

Additionally, I found the format of sam file is like below. From my judgement, column four ("317") is the beginning position, the MD:Z value is the read length. So I can find the splicing sites then. I am really new to use bowtie, any of your suggestions are helpful! thanks!

s1  0  chr1  317  42  79M *  0  0  CCATGCGAGT......CGGTAGTA  IIIIIIIIIII.....IIIIIIIIIII  ......  MD:Z:56

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.0 years ago by biolab ★ 1.4k

1

Entering edit mode

Bowtie2 will assume that any spliced over region is a deletion and heavily penalize an alignment to it. If an intron is very short (e.g., a microintron), then you might find alignments that span it (though even then you might have to tweak the score settings). If you really want to find splice junctions then you're better off with STAR or tophat2, as Geek_y suggested.

The MD aux tag gives mismatching bases in the read vs. the reference. MD:Z:56 is actually not a full MD string, since the full 79bp of the read mapped. Either you cut off part of the line (I can tell that you didn't copy everything over anyway) or that line got truncated by bowtie2. The full string is something more like MD:Z:56A22.

ADD REPLY • link 11.0 years ago by Devon Ryan 105k

0

Entering edit mode

Thanks a lot, Devon, I really learned something. Cheers!

ADD REPLY • link 11.0 years ago by biolab ★ 1.4k

0

Entering edit mode

AS far as I know, splicing sites have some signatures/patterns. If a read spans two genomic regions, it may also indicates insertions/deletions in the data/reference. You may need to look at the annotations, if available, to make sure those sites are intron/exon boundaries.

It would be good if you can share more info about your data. DNA/RNA, organism, platform etc.

ADD REPLY • link updated 3.7 years ago by Ram 45k • written 11.0 years ago by GouthamAtla 12k

0

Entering edit mode

Hi Geek_y, thanks for your advice. My data is RNAseq data. The organisms I am working on does have reference genome and annotations. However, I am looking at non-coding regions, some regions may not be well annotated. So I suppose I can only predict splicing site by reads mapping (splicing signitures do help for this job). Thanks again for your useful comments!

ADD REPLY • link updated 3.8 years ago by Ram 45k • written 11.0 years ago by biolab ★ 1.4k