Question: Eliminating calls for a particular chromosome, per sample, in a multisample VCF
0
gravatar for mmats010
14 months ago by
mmats01060
mmats01060 wrote:

Hello,

I generated a multisample VCF file using GATK's HaplotypeCaller/GenotypeGCVFs method. In my multisample VCF file, some samples have aneuploid chromosome numbers. For example, Sample1 might be 2N for chr1, chr2, and chr3. Sample2, however, might be 2N for chr1 and chr3, but 3N for chr2.

Is there a way to selectively exclude all of the sample genotypes for Sample2/chr2, while leaving all other sample genotypes for chr2 (and all other chromosomes for Sample2) intact?

I already tried removing the individual chromosomes in the sample .g.vcf files using vcftools and the "--not-chr" options, then re-running GenotypeGVCFs. However, this still included calls for the excluded chromosomes for all of the specified samples, and I can't figure out where the calls themselves were originating from.

Perhaps there is a way to set sample genotypes to "NoCall" for individual chromosomes?

Thanks, Mike

(As a bonus, could a given solution take advantage of a list of chromosomes? Each "chromosome" (chr2, in the above example) is a pseudochromosome scaffold of contigs, and the genotype calls in the VCF refer to the contig names, not the pseudochromosome name. Each pseudochromosome consists of ~20 contigs.)

>less chr2.list
ctg3
ctg78
ctg23
ctg746
...
snp aneuploid vcf • 479 views
ADD COMMENTlink modified 14 months ago by Pierre Lindenbaum108k • written 14 months ago by mmats01060
1
gravatar for Pierre Lindenbaum
14 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum108k wrote:

This kind of awk script will remove all the genotypes from your vcf for sample "S1" on any chromosome but the chromosome 2. Adjust it to your needs. Note that it doesn't change some values like "DP", .. in the INFO column.

awk -F '\t' '/^#CHROM/ {split($0,header);} /^#/ {print;next;} /^2\t/ { print;next;} {for(i=1;i<=NF;++i) {printf("%s%s",(i>1?"\t":""),(i>9 && header[i]!="S1"?".":$i));}printf("\n"); }'
ADD COMMENTlink modified 14 months ago by WouterDeCoster29k • written 14 months ago by Pierre Lindenbaum108k

Thanks, but does this mean that it will remove every call for every chromosome EXCEPT for chr2, Sample 1? I need to remove the Sample1/chr2 calls themselves, not everything else.

Further, where in your script would I refer to the main "allsamples.vcf" file? Finally, I imagine that the "2" in this part of the script "/^2\t/" is where the chromosome number is specified?

ADD REPLYlink written 14 months ago by mmats01060
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 915 users visited in the last hour