extracting SNPs from .vcf file along with CHR and POS
1
0
Entering edit mode
3.9 years ago
anamaria ▴ 220

Hello,

I download file 00-All.vcf.gz from https://ftp.ncbi.nih.gov/snp/pre_build152/organisms/human_9606_b150_GRCh37p13/VCF/

I need to extract a list of SNPs (which I put in META_rs) from there long with their CHR# and position I am doing this:

vcftools --gzvcf 00-All.vcf.gz --snps META_rs --recode --out  META_RSID

Is this correct way to go about this? the command takes a long time to execute...

Thanks Ana

vcftools • 3.3k views
ADD COMMENT
0
Entering edit mode

Seems that you got the answer here!

ADD REPLY
2
Entering edit mode
3.9 years ago
Ram 43k

vcftools is obsolete, I think. Use bcftools instead. You should be able to use

bcftools query -Hf '%CHROM\t%POS\t%REF\t%ALT\t%ID\n' 00-Al.vcf.gz

Customize the format string to get what you exactly need.

EDIT (29-Nov-2021)

I might be wrong about vcftools being obsolete. There is also a catch with extracting loci from VCF files - the VCF may not be left-aligned, and it might have multi-allelic entries, which would interfere with operations that rely on matching CHR-POS-REF-ALT entries. One should use either vt (vt decomp, then vt normalize) or bcftools norm -m to pre-process the VCF before extracting fields, as POS, REF and ALT for certain entries might change after these operations. vt is better than bcftools IMO as it retains the old multi-allelic/pre-left-aligned information as INFO entries.

ADD COMMENT

Login before adding your answer.

Traffic: 2967 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6