Identify coordinates from the VCF generated from Vg call
1
0
Entering edit mode
5 months ago

Hi,

I have generated a VCF from vg call, and here is few lines of the VCF file I got.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  AU
12      5958    40863053_40863056       G       A       11.0614 PASS    DP=8    GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:8:7,1:-2.918876,-2.406165,-15.847851:5:-1.366429:7.019608:1
12      5965    40863056_41238580       G       A       12.5346 PASS    DP=7    GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:7:6,1:-2.800967,-2.104173,-13.548030:6:-1.281771:7.019608:1
12      6194    40863067_40863068       C       A       22.3351 PASS    DP=4    GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:4:3,1:-3.267579,-1.518863,-7.439325:17:-1.116292:2.378082:1
12      6224    40863071_41233469       G       A       26.3073 PASS    DP=10   GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:10:8,2:-4.516325,-2.365773,-18.047362:21:-1.105658:8.157895:2
12      6704    40863102_41245696       A       ACCC    25.9236 lowad   DP=1    GT:DP:AD:GL:GQ:GP:XD:MAD        1/0:1:0,1:-2.782378,-0.784550,-1.306884:5:-1.368966:0.603774:0
12      6712    41245696_41245698       G       C       17.2222 PASS    DP=5    GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:5:4,1:-2.930286,-1.710612,-9.160134:12:-1.157166:6.998316:1
12      6714    41245698_41245700       A       C       34.7345 PASS    DP=5    GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:5:3,2:-5.773195,-2.777300,-8.469752:29:-1.099623:13.392858:2
12      6718    41245700_41245702       T       C       17.2222 PASS    DP=5    GT:DP:AD:GL:GQ:GP:XD:MAD        0/1:5:4,1:-4.298005,-3.078331,-10.245972:12:-1.157166:13.392858:1
12      20103   40863559_40863562       G       A       49.499  lowdepth        DP=2    GT:DP:AD:GL:GQ:GP:XD:MAD        1/1:2:0,2:-5.457024,-1.461369,-1.160338:3:-1.504111:4.094972:2

I have no idea that using the POS and ID here, can I locate the coordinate of each variant on the chromosome (e.g. Chr12: 23240000)? Also, what is the meaning of lowad and lowdepth under FILTER column? Any answer will be very welcome!

Monica

call vg vgtool • 261 views
ADD COMMENT
1
Entering edit mode
5 months ago
glenn.hickey ▴ 240

POS is the position along the path in the graph. If that path corresponds to a chromosome, then it will also be the position along that chromosome.

ID is the pair of nodes ids in the graph that define the bubble of variation corresponding to the site. You can visualize that first variant in your example by running vg find -x graph.vg -p 12:5958 -c 3 | vg view -pd - | dot -Tpng > site.png. This extracts the subgraph a the chromosome position (12:5958). It will show you the g->A snp, as well as the nodes with the ids given ( 40863053 and 40863056) that surround it.

ADD COMMENT
0
Entering edit mode

Hi, many thanks for replying. It's now much clearer what these parameters mean, but I still have a question. I generated the VCF I showed above using one sample A, targeting on a region on Chr12. I used the codes:

Construct graph and index (according to whole genome variation graph)

echo constructing graph
(seq 1 21; echo X; echo MT) | parallel -j 23 "time vg construct -C -R {} -r $ref -v $vars -t 1 -m 32 > $dir/{}.vg"

vg ids -j $(for i in $(seq 1 21; echo X; echo MT); do echo $i.vg; done)
vg index -x wg.xg $(for i in $(seq 1 21; echo X; echo MT); do echo $i.vg; done)

for chr in $(seq 1 21; echo X; echo MT);
do
  vg prune -r $chr.vg > $chr.pruned.vg
done

vg index -g wg.gcsa $(for i in $(seq 21; echo X; echo MT); do echo $i.pruned.vg; done)

Map and call variant (according to Whole genome calling and genotyping)

vg map -x $dir/wg.xg -g $dir/wg.gcsa -f $fastq/A_R1.fastq.gz -f $fastq/A_R2.fastq.gz > $dir/A.gam 
vg gamsort $dir/A.gam -i $dir/A_sorted.gam.gai > $dir/A_sorted.gam
vg chunk -x wg.xg -c 10 -p 12:20140000-23600000 -g -a A_sorted.gam -O pg > A_sorted_chunk.pg 
vg augment A_sorted_chunk.pg A_chunk_0_12_20139664_23600147.gam -s -A A_final.gam > A_final.pg
vg snarls A_final.pg > A_final.snarls
vg pack -x A_final.pg -g A_final.gam -o A_final.pack
vg call A_final.pg -r A_final.snarls -k A_final.pack -s sample > sample_calls.vcf

Everything went fine, except from the POS in the VCF seems not to be the coordinates along the Chr12. The POS should be around this 12:20,140,000-23,600,000 region...

Did I do something wrong in the code by missing some parameters? Thanks in advance.

ADD REPLY

Login before adding your answer.

Traffic: 1588 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6