Find reference base for a position in a given read from a BAM file
2
0
Entering edit mode
8.8 years ago
Andrew ▴ 60

If I take a random read from a BAM file, and want to know what the reference base would be at position 1 of that read, is there a way to do this? I was thinking I could do something with the POS column of the BAM file and then add to it the position of the base in the read I am looking at to get the position on the reference sequence.

Maybe a different way to say this is does there exist a way to get the nth base from a specific chromosome in a reference sequence (Fasta).

BAM • 5.8k views
ADD COMMENT
4
Entering edit mode
8.8 years ago

There is another way to get the reference base from just the BAM file, by using the MD tag: https://github.com/vsbuffalo/devnotes/wiki/The-MD-Tag-in-BAM-Files. I've implemented this as the .parse_md method in the simplesam Sam class.

ADD COMMENT
0
Entering edit mode

Thanks for the explanation of the MD flag and the detailed examples Matt :)

It blows my mind that the MD tag doesn't include a way to encode insertions... but I can't say it surprises me :P

Your parser, if his data contains the MD flag, is the way to go I think!

ADD REPLY
1
Entering edit mode
8.8 years ago
Andrew ▴ 60

Found the solution: http://seqanswers.com/forums/showthread.php?t=17315

In case anyone finds this in the future just do the following

samtools faidx <file.fa> chr1:500-500

This will get you the 500th base in the given fasta file.

ADD COMMENT
2
Entering edit mode

Take care that if the read aligns with indels then the n-th position in the read will not correspond to the the POS+n position in the reference.

ADD REPLY

Login before adding your answer.

Traffic: 1473 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6