pysam fetch with partial reference name
4.4 years ago
yarmda ▴ 40

I'm new to pysam and trying to parse a bamfile by a RefSeq Accession number. However, that accession number is only part of the reference name (column 3 in the bamfile header) and pysam fetch seems to need the whole reference name in order to search.

Is there a way I can search on a substring of the reference name?

For example, a reference name may look like : "gi|158333233|custom|NC_009925.1|” where NC_00925 is the accession number I want to search on.

Thanks!

Edit: Also, how would I go about parsing the output? It looks like it is the detail of the bamfile, just without the third column (that I searched on). I want to get the first column and store it as a new variable. How could I do that?

for read in samfile.fetch("etc.")


doesn't let me subset like that. So, I'm guessing it's not indexed.

Trying the above gives the error: 'pysam.calignedsegment.AlignedSegment' object has no attribute '__getitem__'

4.4 years ago

Regarding using a partial contig name, there's no built in way to do that, you'll need to write a function to iterate over the contigs and determine which one is the one you want.

Regarding getting read names, it's read.query_name. Please see the documentation for more information.