subseting VCF by bcftools
1
0
Entering edit mode
7.3 years ago
miaowzai ▴ 390

I'm trying to include only single nucleotide variant, or some say SNPs, from the 1000 genomes project data. (For example, the phase 3 newest release 2013)

From bcftools manual under "view", it says:

-v, --types snps|indels|mnps|other comma-separated list of variant types to select. Site is selected if any of the ALT alleles is of the type requested. Types are determined by comparing the REF and ALT alleles in the VCF record not INFO tags like INFO/INDEL or INFO/VT. Use --include to select based on INFO tags.

I haven't checked the subset file but according to this instruction, "-v" checks the REF and ALT alleles to decide if the variant is SNP or not. I was worried that if multi-allelic single nucleotide sites will be also exluded because the ALT column will have strings with length longer than 1 (e.g. "A,T" at the ALT column).

I tried to set: bcftools view --include 'VT=SNP' in the INFO column, but error message popped out and say

the tag "INFO/SNP" is not defined in the VCF header

My questions are: (1) How can I obtain only variants that have "VT=SNP" in the INFO column? (2) Does -v snps retain variants with low allele frequency? Since SNP means common (allele frequency > 0.1% or 0.2%) single nucleotide variant.

Thanks!

bcftools VCF • 3.0k views
ADD COMMENT
1
Entering edit mode
7.3 years ago
guillaume.rbt ★ 1.0k

Hi,

(1) To get only SNP from a vcf I use vcftools :

vcftools --vcf your_vcf.vcf --recode --remove-indels --out output.vcf

(2) I guess the -v snps retains all SNP whatever the allele frequency (in a vcf file even singletons are called "SNP")

ADD COMMENT
0
Entering edit mode

Thanks for the answer! I was using vcftools and it works great. But somehow I think bcftools works faster than vcftools. I'm confused with the setting of the command, so I will be waiting for other answers. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2152 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6