Question: Best way to obtain reference and non-reference k-mer
gravatar for michele.schimd
6 months ago by
michele.schimd10 wrote:

Hi everyone,

I'm wondering what would be the best strategy to obtain the list of all k-mers in the graph with each k-mer labeled as reference (i.e., from specific path) or not-reference. A very nice to have extra feature would be to have, for each k-mer, the index(es) of its occurrence(s) in the reference (linear) space.

I see two possible roads:

vg kmers ...
vg find -k ...

Both strategies have prons and cons. The former produces more succinct output but I haven't found a way to discriminate whether k-mers belongs or not to a path (nor I found an easy way to obtain the index). The latter has more extensive output, but also produces much larger outputs.

Thanks for any suggestion, Michele S.

vg • 249 views
ADD COMMENTlink modified 6 months ago by Jouni Sirén130 • written 6 months ago by michele.schimd10
gravatar for Jouni Sirén
6 months ago by
Jouni Sirén130
UCSC Genomics Institute
Jouni Sirén130 wrote:

The easiest way is probably doing it outside vg:

  1. Extract all kmers (kmer occurrences) with vg kmers.
  2. Extract the selected paths in FASTA format with vg paths -v -F -p path-names.txt > output.fa.
  3. Determine the kmers in the paths using an external tool.
  4. Compare the kmer sets with an external tool.

The path positions for non-path kmers are not always well-defined. There is some machinery for determining them (for alignments, not kmers), but even then, we often have to make arbitrary choices or give up trying.

ADD COMMENTlink written 6 months ago by Jouni Sirén130

Thank you very much for the prompt and detailed reply. I was considering using external tools, but I wanted to be sure that there were no better ways built in the toolkit.

ADD REPLYlink written 5 months ago by michele.schimd10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 966 users visited in the last hour