Question: Output bigger than input. VCFtools
0
gravatar for victor.agrs
5 weeks ago by
victor.agrs0 wrote:

Hello, I'm trying to extract a subset of SNPs using vcftools. I have a list of 2474008 SNPs and a 90 GB vcf file. I used this command:

vcftools --vcf GCF_000001405.25.vcf --snps rsLeptin_adj.txt --recode --recode-INFO-all --out match_rsLeptin_adj.txtBlockquote

But my output file has 2562258 lines (88250 more SNPs, apparently) , so I'm not sure if the command is not specific or if there is some error while processing that gives more lines. I have also tried with awk, using an array:

awk '{array[$1]}' rsLeptin_ad.txt

matching with the 3rd column of the vcf file, wich contains the SNPs

awk 'FNR==NR {array[$1]; next}; $3 in array' rsLeptin_adj.txt GCF_000001405.25.vcf

Has anyone experienced the same issue? Any comment will help. Thanks in advance

snp software error • 139 views
ADD COMMENTlink modified 5 weeks ago • written 5 weeks ago by victor.agrs0

If I understand correct, you have a file rsLeptin_adj.txt containing IDs that may be in the ID column of your vcf file and you like to filter out those variants.

For this use bcftools instead of vcftools. vcftools is deprecated.

bcftools view -i "ID=@rsLeptin_adj.txt" GCF_000001405.25.vcf > out.vcf

fin swimmer

ADD REPLYlink written 5 weeks ago by finswimmer12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1265 users visited in the last hour