Best way to obtain reference and non-reference k-mer
Entering edit mode
22 months ago

Hi everyone,

I'm wondering what would be the best strategy to obtain the list of all k-mers in the graph with each k-mer labeled as reference (i.e., from specific path) or not-reference. A very nice to have extra feature would be to have, for each k-mer, the index(es) of its occurrence(s) in the reference (linear) space.

I see two possible roads:

vg kmers ...
vg find -k ...

Both strategies have prons and cons. The former produces more succinct output but I haven't found a way to discriminate whether k-mers belongs or not to a path (nor I found an easy way to obtain the index). The latter has more extensive output, but also produces much larger outputs.

Thanks for any suggestion, Michele S.

vg • 514 views
Entering edit mode
22 months ago
Jouni Sirén ▴ 270

The easiest way is probably doing it outside vg:

  1. Extract all kmers (kmer occurrences) with vg kmers.
  2. Extract the selected paths in FASTA format with vg paths -v -F -p path-names.txt > output.fa.
  3. Determine the kmers in the paths using an external tool.
  4. Compare the kmer sets with an external tool.

The path positions for non-path kmers are not always well-defined. There is some machinery for determining them (for alignments, not kmers), but even then, we often have to make arbitrary choices or give up trying.

Entering edit mode

Thank you very much for the prompt and detailed reply. I was considering using external tools, but I wanted to be sure that there were no better ways built in the toolkit.


Login before adding your answer.

Traffic: 2204 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6