Using VG to extract path sequences/sub-sequences
1
0
Entering edit mode
23 months ago
Andy Yates ▴ 120

Hi there. I'm looking into adapting refget into something that can work with graph genomes and not just linear genomes. Specifically I'd like to extract the sequence of a full path and subsequence of a path given a path identifier. I've spent this afternoon dipping in and out of vg and other related tools and believe this is possible but I'm just not seeing how. Is this possible? If so would I need to look into coding something custom or is there a vg command that could do this?

Thanks in advance

vg • 628 views
ADD COMMENT
1
Entering edit mode
13 months ago
glenn.hickey ▴ 520

There's no interface to do that exactly. You can get a whole sequence with vg paths -F. You can get part of a sequence with vg chunk -x graph.vg -p chr10:200-300 -c 0 -E range.bed | vg paths -F -Q chr10 -v -`

But the sequence returned will be rounded to the nearest graph node. So it may actually be, say, 180-400, which would be written in range.bed above. So you'd need to put that through samtools faidx to get the exact range you want. Very clunky, but doable in theory I guess.

ADD COMMENT

Login before adding your answer.

Traffic: 2975 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6