Question: Extracting specific regions(ex, chr1:100-200 of hg38) from whole genome .vg and .gam file
0
gravatar for hd00ljy3
9 months ago by
hd00ljy30
hd00ljy30 wrote:

I want to extract specific regions like chr1:100-200 of hg38 from whole genome .vg and .gam file


I built a .vg from structural variation VCF file and I want to examine if the records in VCF are well represented in the .vg file.

What I want to do is as follows.

I have an Insertion(INS) at chr1:100 on hg38 I want to get nodes in chr1:50-200 along with nodes representing the Insertion.

But since I can not find any labels(variant IDs or chromosomal coordinates) on the graph when I converted .vg to .dot, I was not able to extract specific regions or nodes adjacent to specific SVs.

I looked through the wiki and github issues but I was not able to find solutions.

Could you help me on this?

vg • 276 views
ADD COMMENTlink modified 8 months ago by glenn.hickey150 • written 9 months ago by hd00ljy30
1
gravatar for glenn.hickey
8 months ago by
glenn.hickey150
glenn.hickey150 wrote:

You can extract regions from a GAM with vg chunk but it's clunky.

Index the GAM (very slow):

vg gamsort aln.gam -i aln.sorted.gam.gai > aln.sorted.gam

Extract the chunk:

vg chunk -x graph.vg -a aln.sorted.gam -g -p chr1:100-200 -c 10 > chunk.vg

The -c 10 specifies how far from the reference path to search from to cover, say, insertions.

ADD COMMENTlink modified 8 months ago by h.mon31k • written 8 months ago by glenn.hickey150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 864 users visited in the last hour