Question: Removing alternate alleles from .vcf
gravatar for ucbtsm8
15 months ago by
ucbtsm80 wrote:

I had whole genome sequence data in .vcf format from several different individuals. I extracted a SNP set from each individual, removed any SNPs with more than 2 alleles, and then merged them all together in bcftools.

Everything seems OK, other than there are several sites which have more than 1 alternate allele in the merged dataset. For example:

1       776546  .       A       G,T,C   287     .       GG=257,297,0,297,730,285,730,221,285,730;DP=135 GT:PL   0/1:257,0,221   0/1:86,0,133    0/0:0,12,165    0/1:255,0,325   0/1:337,0,77    0/0:0,3,46      0/0:0,129,1000  0/0:0,42,291

However, if you notice the genotypes, they are all either 0/1 or 0/0, i.e. there are only 2 alleles present in the callset. The excessive alternate alleles is messing up a new merge that I want to do, because bcftools is saying that there are 4 alleles, but only 3 PL score entries.

Does anyone know of a way to trim the vales in the ALT column on the vcf, so that there is the 'correct' number, given the number of different genotypes?

EDIT: Ive just found the command bcftools view --trim-alt-alleles. I think it's done what I hoped it has, but the documentation isnt very descriptive. Could someone confirm what it does? Thanks.

next-gen • 524 views
ADD COMMENTlink modified 15 months ago • written 15 months ago by ucbtsm80

I will check later

ADD REPLYlink written 15 months ago by Kevin Blighe59k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1974 users visited in the last hour