Extracting specific regions(ex, chr1:100-200 of hg38) from whole genome .vg and .gam file
1
0
Entering edit mode
4.3 years ago
hd00ljy3 • 0

I want to extract specific regions like chr1:100-200 of hg38 from whole genome .vg and .gam file


I built a .vg from structural variation VCF file and I want to examine if the records in VCF are well represented in the .vg file.

What I want to do is as follows.

I have an Insertion(INS) at chr1:100 on hg38 I want to get nodes in chr1:50-200 along with nodes representing the Insertion.

But since I can not find any labels(variant IDs or chromosomal coordinates) on the graph when I converted .vg to .dot, I was not able to extract specific regions or nodes adjacent to specific SVs.

I looked through the wiki and github issues but I was not able to find solutions.

Could you help me on this?

vg • 1.3k views
ADD COMMENT
1
Entering edit mode
4.3 years ago
glenn.hickey ▴ 520

You can extract regions from a GAM with vg chunk but it's clunky.

Index the GAM (very slow):

vg gamsort aln.gam -i aln.sorted.gam.gai > aln.sorted.gam

Extract the chunk:

vg chunk -x graph.vg -a aln.sorted.gam -g -p chr1:100-200 -c 10 > chunk.vg

The -c 10 specifies how far from the reference path to search from to cover, say, insertions.

ADD COMMENT

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6