Question: best way of filtering a VCF file using a list of SNP IDs and ref/alt alleles
gravatar for auraf85
3.0 years ago by
auraf8520 wrote:


I need to filter a VCF file keeping only those SNPs that match with a separate list containing 3 columns: their ID, their reference allele and their alternate allele.

I am very new to this kind of procedure so I am trying to understand the most effective strategy to work on this.

I have been suggested to use VCFtools or BCFtools, but I am not sure I can select variants also on the basis of their ref/alt alleles. Is it possible to do this just using the command line?

Thank you

bcftools vcf • 1.9k views
ADD COMMENTlink modified 3.0 years ago by Pierre Lindenbaum123k • written 3.0 years ago by auraf8520
gravatar for harold.smith.tarheel
3.0 years ago by
United States
harold.smith.tarheel4.4k wrote:

If your separate list with IDs is formatted the same way as the VCF, then a simple grep should work:

grep -f ID.list full.vcf > filtered.vcf

Edit: Just realized that this command will remove the headers. Quickest solution is to add a line at the top of the ID.list that has the '#' character.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by harold.smith.tarheel4.4k

hey, this works but it takes a very long time. What I did instead was adding reference and alternate allele letters to SNP id column and then use VCFtools to make selection.

ADD REPLYlink written 2.9 years ago by auraf8520
gravatar for Pierre Lindenbaum
3.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum123k wrote:


Select IDs in fileKeep and exclude IDs in fileExclude:

 java -jar GenomeAnalysisTK.jar \
   -R ref.fasta \
   -T SelectVariants \
   --variant input.vcf \
   -o output.vcf \
   -IDs fileKeep \
   -excludeIDs fileExclude
ADD COMMENTlink written 3.0 years ago by Pierre Lindenbaum123k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 970 users visited in the last hour