Removing alternate alleles from .vcf
1
0
Entering edit mode
5.2 years ago
ucbtsm8 ▴ 20

I had whole genome sequence data in .vcf format from several different individuals. I extracted a SNP set from each individual, removed any SNPs with more than 2 alleles, and then merged them all together in bcftools.

Everything seems OK, other than there are several sites which have more than 1 alternate allele in the merged dataset. For example:

1       776546  .       A       G,T,C   287     .       GG=257,297,0,297,730,285,730,221,285,730;DP=135 GT:PL   0/1:257,0,221   0/1:86,0,133    0/0:0,12,165    0/1:255,0,325   0/1:337,0,77    0/0:0,3,46      0/0:0,129,1000  0/0:0,42,291

However, if you notice the genotypes, they are all either 0/1 or 0/0, i.e. there are only 2 alleles present in the callset. The excessive alternate alleles is messing up a new merge that I want to do, because bcftools is saying that there are 4 alleles, but only 3 PL score entries.

Does anyone know of a way to trim the vales in the ALT column on the vcf, so that there is the 'correct' number, given the number of different genotypes?

EDIT: Ive just found the command bcftools view --trim-alt-alleles. I think it's done what I hoped it has, but the documentation isnt very descriptive. Could someone confirm what it does? Thanks.

next-gen • 2.0k views
ADD COMMENT
0
Entering edit mode

I will check later

ADD REPLY
0
Entering edit mode
2.8 years ago
zjardyn • 0

In R you can check if the .vcf contains alternate alleles with the following code:

library(vcfR)

vcf <- read.vcfR("example.vcf")

gt <- extract.gt(vcf)

unique(unlist(apply(gt, 1, unique)))

[1] "0/0" "1/1" "0/1" "2/2" "0/2" "1/2" "2/1"

In this case 0 is the reference, 1 is the first alternate and 2 is the second alternate.

ADD COMMENT

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6