Question: Extract heterozygous genotype (GT: 0/1 and 1/2) from the vcf file and calculate allelebalance.
0
gravatar for kirannbishwa01
3.7 years ago by
kirannbishwa011.1k
United States
kirannbishwa011.1k wrote:

Hi,

I need to extract heterozygous genotype from my vcf file. The genotype GT:0/1 and 1/2 should be extracted separately and is under the FORMAT field in vcf file. I am not posting the data from vcf file since there is been some formatting issue while pasting the values lately (not sure why???).

Also, I need to calculate the AB (allele balance) values for these heterozygous genotype (by sample). I have tried several vcf manipulator utilites but its not been very much helpful.

Can someone please assist me regarding the problem.

Thanks,

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by kirannbishwa011.1k
0
gravatar for Pierre Lindenbaum
3.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:

using Bioalcidae: https://github.com/lindenb/jvarkit/wiki/BioAlcidae

$ curl -s "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.chrX.BI_Beagle.20100804.genotypes.vcf.gz" | gunzip -c |\
java -jar ~/src/jvarkit-git/dist/bioalcidae.jar -F vcf -e 'while(iter.hasNext()) {var ctx=iter.next(); for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i); if(!g.isHet()) continue; out.println(ctx.getContig()+" "+ctx.getStart()+" "+g.getSampleName()+" "+g.getAlleles()); }}'



X 60009 HG00142 [A*, C]
X 60009 HG00148 [A*, C]
X 60009 HG00231 [A*, C]
X 60009 HG00638 [A*, C]
X 60009 NA12046 [A*, C]
X 60009 NA12249 [A*, C]
X 60009 NA12813 [A*, C]
X 60009 NA18519 [A*, C]
X 60009 NA18541 [A*, C]
X 60009 NA18633 [A*, C]

I leave calculation of AB as an exercise :-)

ADD COMMENTlink modified 3.7 years ago • written 3.7 years ago by Pierre Lindenbaum124k

Hi @Pierre'

Thanks for the support. I will try your script tomorrow, also I want to get only the GT:0/1 or GT/1/2 separately. Something for calculating AB will be great.

Thanks,

ADD REPLYlink written 3.7 years ago by kirannbishwa011.1k

Hi Pierre, I installed and ran the program but getting some error.

Command: curl -s "raw01_variants_S1-forTest.vcf" | gunzip -c | java -jar /home/everestial007/jvarkit/dist/bioalcidae.jar -F vcf -e 'while(iter.hasNext()) {var ctx=iter.next(); for(var i=0;i< ctx.getNSamples();++i) {var g=ctx.getGenotype(i); if(!g.isHet()) continue; out.println(ctx.getContig()+" "+ctx.getStart()+" "+g.getSampleName()+" "+g.getAlleles()); }}'

Error message: [main] ERROR jvarkit - Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file [main] ERROR jvarkit - Command failed

I think the header of the VCF file always starts with #. Since CHROM is in the first column its typically #CHROM.

Thanks,

ADD REPLYlink written 3.7 years ago by kirannbishwa011.1k

why did you run 'gunzip -c ' if the input is not a gzipped source ?

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum124k
2

why do you use curl ? please, try to understand the command line before running it.

ADD REPLYlink written 3.7 years ago by Pierre Lindenbaum124k

Hi Pierre,

Thanks for the command but I a little not savvy with manipulating the commands. I will give a try. I had gzipped the file so used gunzip, but the posted command just had .vcf.

Will let you know !

ADD REPLYlink written 3.7 years ago by kirannbishwa011.1k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1361 users visited in the last hour