Entering edit mode
5.6 years ago
vaslanzadeh
▴
20
Hi, pysam prints me reads that has different chromosome (ref_ID) than the one that I specify with samfile.fetch("chromosome", 100, 20000). The following is my example code:
import pysam
samfile = pysam.AlignmentFile("ex1.bam", "rb")
reads = samfile.fetch("chromosome", 100, 20000)
for item in reads:
print (str(x))
When I use samfile.fetch("X", 100, 20000) it prints reads that their chromosome number is 10, this is one of the reads for example:
K00114:267:H3YL7BBXX:4:2122:14874:29659 163 10 130 255 14M83903N136M 10 84334 150 GAAAATGAAATAAGCAAAGTTAGGACTCGTAAAAATGGCAATCAAACCAAGAACGAAGGGCAAAACGTACTCCTCAAGATCGGTGGGTTCGCAGTGGTTCAACAGGCTTGGTTTCAAGCAGAACAAGTACGGAACTTGTAAATTTTTGTC
But when I use samfile.fetch("IX", 100, 20000) it prints reads on chromosome 4!, one example:
K00114:267:H3YL7BBXX:4:1124:28980:37712 419 4 17 3 22S12M86722N116M 4 86873 128 TGCAACGTCCACAAGCAGTTCTCACCACACCACATCAACAAGTTCTTACACATCTTCTACTTACACTGCACAAATATCTTCTACCTCCGCTGCTGCTACTTCTTCTGCTCCAGCAGCCCTGCCAGCAGCCCATAAAACTTCATCTCACAA
or samfile.fetch("V", 100, 20000) print reads for chromosome 6! one example:
K00114:267:H3YL7BBXX:4:1201:16346:16805 419 6 137 0 148M2S 6 217 148 CAAACACTAAATCAAAACAGTGAAATACTACTACATCAAAACGCATATTCCCTAGAAAAAAAAATTTCTTACAATATACTATACTACACAATACATAATCACTGACTTTCGTAACAACAATTTCCTTCACTCTCCAACTTCTCTGCTCTA
There is something strange here, does anybody knows what is the possible source of this error?
What are the reference sequences at the bam header?
The bam file contains mapped RNAseq data from S. cervesiae
This is the heather:
The numbers match the order of the chromosomes in the header. I would make sure it's okay by looking at the reads you got using
samtools view |grep [read_name]
This is the output for: samtools view | grep "K00114:267:H3YL7BBXX:4:1124:28980:37712"
The chromosome name is correct this time, it is IX not 4. Should I update my bam file's heather?
So it's indeed mapped to IX chromosome