Question: pysam fetch: find desired position within short read
1
gravatar for c.wymant
6.2 years ago by
c.wymant10
European Union
c.wymant10 wrote:

Using pysam fetch, I can find & iterate over all short reads that map to a given position/column (1000, say) in the reference genome:

BamFile = pysam.AlignmentFile(FileName, "rb")
position=1000
ShortReadsMappingHere = BamFile.fetch("MyReference", position-1, position)
for ShortRead in ShortReadsMappingHere:
  # do something

What's the most efficient way to find where the position I'm interested in (here 1000) is within each ShortRead? i.e. ShortRead.seq[0] will give me the first element of this short read, but which element is the position I want?
This line of code must get it right:

  ThisPositionInThisRead = ShortRead.get_reference_positions(full_length=True).index(position-1)

because it finds, for every position in the short read, the position in the reference genome that it maps to (or 'None' if it doesn't map) and retrieves the index of the one mapping to the position I want. It's clearly super inefficient however. The following line

  ThisPositionInThisRead = position - ShortRead.reference_start + ShortRead.query_alignment_start - 1

almost always gets it right, by comparing coordinates, but can get it wrong if this short read has an indel not present in the consensus.
Is there a simple (and correct) way of determining this number based on coordinates?

pysam mapping short reads fetch • 4.3k views
ADD COMMENTlink modified 6.2 years ago by Devon Ryan98k • written 6.2 years ago by c.wymant10
0
gravatar for Devon Ryan
6.2 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

Have a look at the mpileup functions (I've not used it with pysam, just the base C API). It'll let you iterate over reads covering a given position and indicate for you which base in a read actually covers that position (it also indicates insertions/deletions).

An example of doing this with the C API (the function names are more or less the same, since pysam is a wrapper around the htslib C API) can be found in PileOMeth (within each pileup, the qpos value indicates the actual position within the alignment).

ADD COMMENTlink written 6.2 years ago by Devon Ryan98k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2037 users visited in the last hour
_