Entering edit mode
9.4 years ago
biolab
★
1.4k
Dear all,
I am a new user of bowtie2. I have two questions that need some of your help.
- I used bowtie2 to map reads on genome and generate a SAM file. Can this SAM file indicate exon-intron splicing? In other words, can SAM file include exon-intron junction information?
- How to convert SAM file to GFF file, which can be readily used in Genome Browser?
I much appreciate your help. Thanks for any of your comments!
Geek_y, thanks a lot for your reply. However, I have one further question: if a read (e.g. 100 bp) spans two genomic regions, bowtie can map the two segments on genome (I do check bowtie can do this), so this is indicative of splicing, and the positions of exon-intron can be identified. Thus bowtie can find splicing site, am I right?
Additionally, I found the format of sam file is like below. From my judgement, column four ("317") is the beginning position, the MD:Z value is the read length. So I can find the splicing sites then. I am really new to use bowtie, any of your suggestions are helpful! thanks!
AS far as I know, splicing sites have some signatures/patterns. If a read spans two genomic regions, it may also indicates insertions/deletions in the data/reference. You may need to look at the annotations, if available, to make sure those sites are intron/exon boundaries.
It would be good if you can share more info about your data. DNA/RNA, organism, platform etc.
Hi Geek_y, thanks for your advice. My data is RNAseq data. The organisms I am working on does have reference genome and annotations. However, I am looking at non-coding regions, some regions may not be well annotated. So I suppose I can only predict splicing site by reads mapping (splicing signitures do help for this job). Thanks again for your useful comments!
Bowtie2 will assume that any spliced over region is a deletion and heavily penalize an alignment to it. If an intron is very short (e.g., a microintron), then you might find alignments that span it (though even then you might have to tweak the score settings). If you really want to find splice junctions then you're better off with STAR or tophat2, as Geek_y suggested.
The MD aux tag gives mismatching bases in the read vs. the reference. MD:Z:56 is actually not a full MD string, since the full 79bp of the read mapped. Either you cut off part of the line (I can tell that you didn't copy everything over anyway) or that line got truncated by bowtie2. The full string is something more like MD:Z:56A22.
Thanks a lot, Devon, I really learned something. Cheers!