Question: Extracting specific regions(ex, chr1:100-200 of hg38) from whole genome .vg and .gam file
0
gravatar for hd00ljy3
12 months ago by
hd00ljy30
hd00ljy30 wrote:

I want to extract specific regions like chr1:100-200 of hg38 from whole genome .vg and .gam file


I built a .vg from structural variation VCF file and I want to examine if the records in VCF are well represented in the .vg file.

What I want to do is as follows.

I have an Insertion(INS) at chr1:100 on hg38 I want to get nodes in chr1:50-200 along with nodes representing the Insertion.

But since I can not find any labels(variant IDs or chromosomal coordinates) on the graph when I converted .vg to .dot, I was not able to extract specific regions or nodes adjacent to specific SVs.

I looked through the wiki and github issues but I was not able to find solutions.

Could you help me on this?

vg • 390 views
ADD COMMENTlink modified 12 months ago by glenn.hickey190 • written 12 months ago by hd00ljy30
1
gravatar for glenn.hickey
12 months ago by
glenn.hickey190
glenn.hickey190 wrote:

You can extract regions from a GAM with vg chunk but it's clunky.

Index the GAM (very slow):

vg gamsort aln.gam -i aln.sorted.gam.gai > aln.sorted.gam

Extract the chunk:

vg chunk -x graph.vg -a aln.sorted.gam -g -p chr1:100-200 -c 10 > chunk.vg

The -c 10 specifies how far from the reference path to search from to cover, say, insertions.

ADD COMMENTlink modified 12 months ago by h.mon32k • written 12 months ago by glenn.hickey190
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2209 users visited in the last hour