Question

Extracting reads from bam file using samtools view for non-standard reference names

0

Entering edit mode

21 months ago

JmzW • 0

I've looked around a bit but couldn't find an answer for this and it's driving me nuts.

I have aligned some Nanopore reads to a reference genome using Minimap2 but instead of the usual human genome chromosome name, it uses a custom reference name (as the reference is a custom plasmid that we constructed).

I am now trying to extract reads from the resulting bam file using something like the following:

samtools view <in.bam> non_standard_chrm:2000-4500 > out.bam

The resulting bam file doesn't give me reads that fall within that particular region, it just prints out all reads mapped from position 1 onward. Does this have anything to do with the chromosome name being non-standard? If so, is there a way I can extract reads from the region?

I've also tried using pysam with the following:

import pysam
bamfile = pysam.AlignmentFile("bamfile.bam", "rb")
for read in bamfile.fetch("non_standard_chrm", 2000, 4500):
      print(read)

The output given is still reads from position 1 onward. Am I doing something wrong?

samtools • 999 views

ADD COMMENT • link updated 10 months ago by Ram 43k • written 21 months ago by JmzW • 0

2

Entering edit mode

What you are trying to do should have worked fine .. as long as non_standard_chrm is present in your BAM file i.e. the name of the reference needs to match in both places.

One thing to try is to enclose the interval in quotes:

samtools view input.bam "Chr10:18000-45500" > output.bam

ADD REPLY • link 21 months ago by GenoMax 141k

0

Entering edit mode

Exactly.

Check the name of your non_standard_chrm contains no special characters (spaces, * "§$!"&%$&/&/ etc ).
also run this to check the name of your bam, and that reads were actually aligned there

samtools idxstats input.bam

ADD REPLY • link 21 months ago by colindaven 6.3k

0

Entering edit mode

I did try to use quotes for the interval but that didn't work either, unfortunately. Running samtools idxstats also showed that the reads were aligned to the contig properly and the reference name shows up.

I'm not sure if special characters exist in the reference name...the only thing that I can think of would be the presence of underscores ("_") or periods (".") in the reference name but other that those, it is alphnumerical. One other thing would be there is some sort of length/size limit for the reference name? I haven't come across that issue so far though.

ADD REPLY • link 21 months ago by JmzW • 0