How to extract information about which chromosome it is, from bam file using pysam?
0
0
Entering edit mode
9 weeks ago
ja4123 • 0

Hey! I am using pysam iterator like this:

alignments = pysam.AlignmentFile("file.bam", "rb")
for line in alignments.fetch(until_eof=True):
print(line)
break


Output looks like this:

HISEQ:157:HAM0GADXX:1:1101:1635:2143    16  15  73530482    42  102M    -1  -1  102 TGGTGGGAAGGTTTGCTCTTCACCAATTAACGAAGGATGGGTAAGGAAGTTAGTTGGTGGTTGGACTCTGCTCTCAGATTCAACCCTCCCTAGCCTTCTATT  array('B', [22, 33, 33, 33, 37, 37, 37, 37, 37, 37, 37, 37, 40, 40, 37, 37, 27, 37, 33, 33, 33, 27, 37, 37, 33, 37, 40, 40, 40, 40, 40, 40, 37, 40, 40, 40, 37, 33, 40, 40, 40, 40, 40, 40, 40, 37, 40, 40, 40, 40, 37, 37, 40, 37, 40, 37, 37, 27, 37, 37, 33, 37, 37, 33, 27, 37, 37, 37, 37, 37, 37, 37, 33, 37, 37, 37, 37, 33, 33, 33, 37, 37, 37, 37, 40, 40, 37, 33, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 37, 33, 33, 33])    [('AS', 0), ('XN', 0), ('XM', 0), ('XO', 0), ('XG', 0), ('NM', 0), ('MD', '102'), ('YT', 'UU')]


I thought that chromosome number is on third position in line which in this example is 15, but after further analysis I think I am wrong. Maybe someone know? Kindly help.

pysam bam • 199 views
1
Entering edit mode
> import pysam
> samfile = pysam.AlignmentFile("test.bam", "rb")
> [print(i.reference_name) for i in samfile]


Check if this gives you the result you expected.

0
Entering edit mode

Your code gives me all chromosemes from 1 to 23, X and Y, and some None. What does mean None, that it is not mapped? Also I received some other like in a header of sam file like: chrUn_gl000234, chr1_gl000191_random and some similar but not much. Do you know what does mean? Thanks for the answer.

1
Entering edit mode

Please refer to reference fasta file headers used in alignment (resulting in bam)

0
Entering edit mode

But in general I have got what I wanted. Thanks!

0
Entering edit mode

yes, it is 15.

but after further analysis I think I am wrong.

0
Entering edit mode

Later I saw something like for example "chr66" and then I noticed that on this position are numbers above 46. I am working on human sample.