Question: Output bigger than input. VCFtools
gravatar for victor.agrs
11 months ago by
victor.agrs0 wrote:

Hello, I'm trying to extract a subset of SNPs using vcftools. I have a list of 2474008 SNPs and a 90 GB vcf file. I used this command:

vcftools --vcf GCF_000001405.25.vcf --snps rsLeptin_adj.txt --recode --recode-INFO-all --out match_rsLeptin_adj.txtBlockquote

But my output file has 2562258 lines (88250 more SNPs, apparently) , so I'm not sure if the command is not specific or if there is some error while processing that gives more lines. I have also tried with awk, using an array:

awk '{array[$1]}' rsLeptin_ad.txt

matching with the 3rd column of the vcf file, wich contains the SNPs

awk 'FNR==NR {array[$1]; next}; $3 in array' rsLeptin_adj.txt GCF_000001405.25.vcf

Has anyone experienced the same issue? Any comment will help. Thanks in advance

snp software error • 307 views
ADD COMMENTlink modified 11 months ago • written 11 months ago by victor.agrs0

If I understand correct, you have a file rsLeptin_adj.txt containing IDs that may be in the ID column of your vcf file and you like to filter out those variants.

For this use bcftools instead of vcftools. vcftools is deprecated.

bcftools view -i "ID=@rsLeptin_adj.txt" GCF_000001405.25.vcf > out.vcf

fin swimmer

ADD REPLYlink written 11 months ago by finswimmer13k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1672 users visited in the last hour