Question: Sam Reference sequence length does not match with the actual fasta sequence input
0
gravatar for ashishtx
4.4 years ago by
ashishtx0
United States
ashishtx0 wrote:

Hello everyone,

I am trying to grasp SAM format specification along with BWA program. I see that the length of the Reference sequence length does not match with the alignment. 

So my question is why SEQ1 length which is 407 bp does not match with the SAM header information which shows that the length of the reference is 402 bp?

Am I missing something very basic? 

Thank you. 

 

>SEQ1

ATGCAGCTGTTCATCCACTGTCAAGGGGTTCATACCGTTGAAGTTACAGGTGAAGAGGAAGTTGCTTTCC

TCAAGCAATACCTCGAGCAGGCCGAGGGCATTGCACCTGCTGATCAAGTCCTCTACCATTCTGGCAAGCC
CCTGAGCGACGAGCTTTCTCTCTCCTGCCTGGAGAATGGTGCTTATGTTGAAGCTGTCGTCCCTCTTCTT
GGAGGTAAGGTCCATGGCTCCCTGGCTCGTGCCGGCAAGGTCAAGGGCCAGACACCGAAGGTAGAGAAAC
AGGAGAAGCGCAAGAAGAAGACCGGCCGTGCCCAGAGGCGCATGCAGTACAACAGGCGGGTCGTGAATGC
CGTTGCCACCTTCGGGCGCANGAGAGGACCCAATGCAAACCAAACTGCATAG

 

 

Sam file header: 

@SQ    SN:SEQ1   LN:402

 

NODE32439length524cov2064.38ID64877    0    SEQ1    1    60    61S161M3I241M58S    *    0    0    CAGCATTTTTTTTGTTATTTGGTTCGTGGGTTGCTGGACGTGTGTACACGTTTGCAAGAAGATGCAGCTGTTCATTCACTGTCAAGAAGTTCACACCGTAGAAGTTACAGGCGACGAGAATGTCGCCTTCCTCAAGGAAGTTCTTGAGCAGGCCGAAGGCATTGCACCTGTTGATCAGGTCCTCTACAACTCTGGCAAGCCCCTGAGTGATGATGTTTCTCTGTCCTCCTGCCTTGAGGATGGTGCTCATGTCGAGGCCGTTGTTCCTCTGCTCGGAGGTAAGGTCCACGGCTCACTGGCTCGTGCTGGCAAAGTGAAGGGCCAGACACCGAAGGTGGAGAAACAGGAGAAACGCAAGAAGAAGACTGGCCGTGCCAAGAGGCGCATGCAGTACAACAGGCGGTTTGTGAATGCTGTTGCCACCTTTGGCCGCAGGAGGGGACCCAATGCAAACCAAACTTCATAGAGAGATGGGCCTGTGACAAATAAAATTTGTATGGTGCGTTCCTGGACGTGGTGCTCAC    *    NM:i:55    MD:Z:14C10G0G5T5T11T2A3G1A2T2T9C2T0A0C2C11G13C6A9C1T17C2C2G0C16G3A8T4T2A2T2C2C5T2T14T5C11C5G2C20A14G14C9C26G1C8C11C2G4C3A21G5    AS:i:133    XS:i:0

 

sam alignment • 1.4k views
ADD COMMENTlink written 4.4 years ago by ashishtx0
2

Your sequence as is shown in this post is really 402bp.

ADD REPLYlink written 4.4 years ago by lh331k

Whoops. You guys are right. Actually I was using the sublime text to count characters and it kept showing 407. Thanks Ashutosh Pandey and Heng Li (I really admire your software). 

ADD REPLYlink written 4.4 years ago by ashishtx0
1

It's an honor to be mentioned in the same line as Dr. Li :-)

ADD REPLYlink written 4.4 years ago by Ashutosh Pandey11k

The length of the reference sequence doesnt include its header. I guess you are adding ">SEQ1" into the length which is wrong. 

ADD REPLYlink written 4.4 years ago by Ashutosh Pandey11k

I am pretty sure I did not include the header. Thanks for the response. 

ADD REPLYlink written 4.4 years ago by ashishtx0
1
atgcagctgt tcatccactg tcaaggggtt cataccgttg aagttaCAGG  50
TGAaGAGGAA GTTGCTTTCC TCAAgcaata cctcgagcag gccgagggca  100
ttgcacctgc tgatcaagtc ctctaccatt ctggcaagcc cctgagcgac  150
gagctttctc tctcctgcct ggagaatggt gcttatgttg aagctgtcgt  200
ccctcttctt ggaggtaagg tccatggctc cctggctcgt gccggcaagg  250
tcaagggcca gacaccgaag gtagagaaac aggagaagcg caagaagaag  300
accggccgtg cccagaggcg catgcagtac aacaggcggg tcgtgaatgc  350
cgttgccacc ttcgggcgca ngagaggacc caatgcaaac caaactgcat  400
ag

You may have forgotten to strip off "\n" character before counting in case you are using some script. 

ADD REPLYlink modified 4.4 years ago • written 4.4 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour