I'm new to pysam and trying to parse a bamfile by a RefSeq Accession number. However, that accession number is only part of the reference name (column 3 in the bamfile header) and pysam fetch seems to need the whole reference name in order to search.
Is there a way I can search on a substring of the reference name?
For example, a reference name may look like : "gi|158333233|custom|NC_009925.1|” where NC_00925 is the accession number I want to search on.
Edit: Also, how would I go about parsing the output? It looks like it is the detail of the bamfile, just without the third column (that I searched on). I want to get the first column and store it as a new variable. How could I do that?
for read in samfile.fetch("etc.") print read
doesn't let me subset like that. So, I'm guessing it's not indexed.
Trying the above gives the error: 'pysam.calignedsegment.AlignedSegment' object has no attribute '__getitem__'