I have used bowtie2 with --local option for locally aligning many sequences against one sequence.
I want to find the position of the start of alignment in the query and the target. I have a hard time finding this out from the SAM output. Could you please help me find it?
Every line of my SAM file is as follows:
"M04141:149:000000000-K3JVJ:1:1101:12608:2943 0 rightSide 319 22 140S46M * 0 0 TTATATTTTTTTTTGACAAGCCTTCCTATTATTCTTTTATATATAAATTGATTAAAACTATTATAAATAAAATAAAATAAAAAATTAATAAAAATATTAAAAAATAAAAATAAATTAATATATAAAAAATAAATTATTTATATTTTGGTTTTATAAAATGTTTTTTCTATGTCTTGTGTGCTTAAG IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:92 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:46 YT:Z:UU"
You should read the SAM specification for an explanation of the fields in the SAM file. In the information above your target (field 3, or Reference Sequence) is called "rightSide", and the number before that (field 2) is the bitwise FLAG describing the alignment. Since it is 0, it is telling you that your sequence successfully maps to the forward strand of your reference. However, the CIGAR string is telling you that your query sequence is Soft clipped for 140 bases, before alignment starts. After clipping, your sequence should begin alignment at base 319 of your reference.
It also helps to try some toy examples with known sequences, so you can see how the values change as you align them. You can take a few bases from "rightSide" and give them to bowtie2 on the command line using the -c parameter (e.g. -c ATTTATATTTTGGTTTTATAAAATGTTTTTTCTATGT). Change a few bases and see how the SAM output changes, take the reverse complement, etc.
In case anyone else had the same question and was looking for an easy way of finding the start and end of the alignment. I just found a python script for converting sam format to .psl.
Psl is the output format of blat, and it specifies query start, query end, target start, and target end in different columns.