Infer order of supplementary alignments [pysam]
1
2
Entering edit mode
5.0 years ago

I have an alignment of an assembled contig with the reference genome. The contigs are very long (PacBio + HiC assembly), and one "query" will generate multiple alignments with the "reference", as a consequence of structural variation, which breaks up the alignments.

I want to figure out if those alignments are in the right order relative to the contig they originate from. Next step would be to check if any of these alignments have deletions - leading to gaps, or change direction (due to inversions). I can't find any pysam attributes which seem do to the trick, right now, and I'm not sure if it's encoded in some obscure tag of the bam?

Suggestions?

pysam alignment • 1.6k views
ADD COMMENT
2
Entering edit mode
5.0 years ago
IP ▴ 760

Hi,

We (in our lab) had a similar problem, were we have reads with many supplementary alignments. Apparently, more people has the same problem as you and me

In order to get the supplementary tag as a function of the query (read), what I did was the following:

  1. Align the reads with minimap2 (I have nanopore data), and output in paf format
  2. Sort the data by column 1 and then by column 2 (sort -k 1,1 -k 2,2). In this case, you are sorting the alignments first by query name (you get all the alignments for a read one after each other) and then when you sort by column two you get the supplementary alignments one after each other as a function of the query

Let me know if this helps,

ADD COMMENT
1
Entering edit mode

Thanks a lot, that sounds very useful. I haven't looked at PAF alignments yet, only in bam format. What I ended up doing was to write a Python script and parse the CIGAR strings to get the order of the supplementary alignments. A preliminary script can be found here.

I'll definitely take a look at your approach the next time I need to do a similar task.

ADD REPLY
0
Entering edit mode

Looks cool! I just had a quick look at the code and it seems that it is a better approach for high throughput processing of the nanopore complex reads. Thanks a lot for sharing!

ADD REPLY
0
Entering edit mode

Comparing the results with your method would definitely be the next step, as I'm not 100% confident I did not mess up somewhere!

ADD REPLY

Login before adding your answer.

Traffic: 1875 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6