I've looked around a bit but couldn't find an answer for this and it's driving me nuts.
I have aligned some Nanopore reads to a reference genome using Minimap2 but instead of the usual human genome chromosome name, it uses a custom reference name (as the reference is a custom plasmid that we constructed).
I am now trying to extract reads from the resulting bam file using something like the following:
samtools view <in.bam> non_standard_chrm:2000-4500 > out.bam
The resulting bam file doesn't give me reads that fall within that particular region, it just prints out all reads mapped from position 1 onward. Does this have anything to do with the chromosome name being non-standard? If so, is there a way I can extract reads from the region?
I've also tried using pysam with the following:
import pysam
bamfile = pysam.AlignmentFile("bamfile.bam", "rb")
for read in bamfile.fetch("non_standard_chrm", 2000, 4500):
print(read)
The output given is still reads from position 1 onward. Am I doing something wrong?
What you are trying to do should have worked fine .. as long as
non_standard_chrm
is present in your BAM file i.e. the name of the reference needs to match in both places.One thing to try is to enclose the interval in quotes:
Exactly.
samtools idxstats input.bam
I did try to use quotes for the interval but that didn't work either, unfortunately. Running samtools idxstats also showed that the reads were aligned to the contig properly and the reference name shows up.
I'm not sure if special characters exist in the reference name...the only thing that I can think of would be the presence of underscores ("_") or periods (".") in the reference name but other that those, it is alphnumerical. One other thing would be there is some sort of length/size limit for the reference name? I haven't come across that issue so far though.