Identify parent of each read in a GAF
Entering edit mode
8 months ago
cfourps ▴ 10

I have a .gfa created by running the fasta files of two genomes through the Cactus-Minigraph pipeline. I am aligning PacBio Hifi reads to that reference using GraphAligner. Column 6 of the resulting GAF file lists the segments for the alignment path of a given read.

I expect to see reads that come exclusively from either genome, and also 'recombined' reads. What is an efficient way to classify each read as coming Genome A or Genome B, or recombined/mixed? Relatedly, how do I get the list of segment IDs belonging to each genome so that I can use it to 'decode' the parent from the segment IDs in column 6 of the GAF file?

gaf vgteam vg • 494 views
Entering edit mode

It's not exactly clean, but one option is vg paths -A with -p set to get the two input genomes individually as GAF, then vg pack -d on each genome path GAF to get a table of nodes that are included.


Login before adding your answer.

Traffic: 2206 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6