Question

Extract heterozygous genotype (GT: 0/1 and 1/2) from the vcf file and calculate allelebalance.

0

Entering edit mode

8.1 years ago

kirannbishwa01 ★ 1.6k

Hi,

I need to extract heterozygous genotype from my vcf file. The genotype GT:0/1 and 1/2 should be extracted separately and is under the FORMAT field in vcf file. I am not posting the data from vcf file since there is been some formatting issue while pasting the values lately (not sure why???).

Also, I need to calculate the AB (allele balance) values for these heterozygous genotype (by sample). I have tried several vcf manipulator utilites but its not been very much helpful.

Can someone please assist me regarding the problem.

Thanks,

genotype vcf heterozygous allele balance SNP • 4.1k views

ADD COMMENT • link 8.1 years ago by kirannbishwa01 ★ 1.6k

score 0 · Answer 1 · 2016-03-24

0

Entering edit mode

8.1 years ago

Pierre Lindenbaum 161k

using Bioalcidae: https://github.com/lindenb/jvarkit/wiki/BioAlcidae

$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz" | gunzip -c |\
java -jar ~/src/jvarkit-git/dist/bioalcidae.jar -F vcf -e 'while(iter.hasNext()) {var ctx=iter.next(); for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i); if(!g.isHet()) continue; out.println(ctx.getContig()+" "+ctx.getStart()+" "+g.getSampleName()+" "+g.getAlleles()); }}'



X 60009 HG00142 [A*, C]
X 60009 HG00148 [A*, C]
X 60009 HG00231 [A*, C]
X 60009 HG00638 [A*, C]
X 60009 NA12046 [A*, C]
X 60009 NA12249 [A*, C]
X 60009 NA12813 [A*, C]
X 60009 NA18519 [A*, C]
X 60009 NA18541 [A*, C]
X 60009 NA18633 [A*, C]

I leave calculation of AB as an exercise :-)

ADD COMMENT • link 8.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi @Pierre'

Thanks for the support. I will try your script tomorrow, also I want to get only the GT:0/1 or GT/1/2 separately. Something for calculating AB will be great.

Thanks,

ADD REPLY • link 8.1 years ago by kirannbishwa01 ★ 1.6k

0

Entering edit mode

Hi Pierre, I installed and ran the program but getting some error.

Command: curl -s "raw01_variants_S1-forTest.vcf" | gunzip -c | java -jar /home/everestial007/jvarkit/dist/bioalcidae.jar -F vcf -e 'while(iter.hasNext()) {var ctx=iter.next(); for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i); if(!g.isHet()) continue; out.println(ctx.getContig()+" "+ctx.getStart()+" "+g.getSampleName()+" "+g.getAlleles()); }}'

Error message: [main] ERROR jvarkit - Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file [main] ERROR jvarkit - Command failed

I think the header of the VCF file always starts with #. Since CHROM is in the first column its typically #CHROM.

Thanks,

ADD REPLY • link 8.1 years ago by kirannbishwa01 ★ 1.6k

0

Entering edit mode

why did you run 'gunzip -c ' if the input is not a gzipped source ?

ADD REPLY • link 8.1 years ago by Pierre Lindenbaum 161k

2

Entering edit mode

why do you use curl ? please, try to understand the command line before running it.

ADD REPLY • link 8.1 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Hi Pierre,

Thanks for the command but I a little not savvy with manipulating the commands. I will give a try. I had gzipped the file so used gunzip, but the posted command just had .vcf.

Will let you know !

ADD REPLY • link 8.1 years ago by kirannbishwa01 ★ 1.6k