Hi all, new to vcftools. I have a vcf file that I need to 1) filter via sample ID and 2) run frequency and LD analysis.
I have no problems with the --freq and --hap-r2 on the vcf file, however, I can't seem to get the filtering on sample ID to work?
I tried using a txt file with 950 sample IDs I want to keep:
vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --keep AA_12M.txt
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf AA_TAS2R38_HAPLO.vcf
--keep AA_12M.txt
--freq
--out AA_HAPLO_12M_freq
Keeping individuals in 'keep' list
After filtering, kept 1 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds
So I'm not sure what's happening? My "keep" file is just one sample name per line in a txt file. If I just use the --remove-indv it works just fine:
vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --remove-indv 4_1969-01D_02-013-1
VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf AA_TAS2R38_HAPLO.vcf
--freq
--out AA_HAPLO_12M_freq
--remove-indv 4_1969-01D_02-013-1
Excluding individuals in 'exclude' list
After filtering, kept 1304 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds
This makes me think it is the format of my keep file? Or am I missing something in the header somewhere?
Thanks all!
NitDawg
I can't be sure but it could be due to the end-line character in your keep text file. In which editor and operating system did you create it?
In the past, I have had issues with text files created in Windows that I then transported to linux. There's a very simple program called dos2unix that could assist. Here it is for ubuntu: https://launchpad.net/ubuntu/trusty/+package/dos2unix
Also check those hypens in your keep file and be sure that they are exactly the same as in the VCF. I have also had issues time and time again with different types of hyphens - there are various in the ASCII encoding.
Yup! That was it, the encoding.
This is on an OSX machine using TextMate. The default was Unicode - UTF-8. I switched to Western - Windows and viola.
Thanks a ton. I banged my head for an afternoon.
No problem. Yes, you wouldn't believe how much this issue frustrated me the first time that I encountered it. I think that it was back in 2014.
Good luck! Kevin
Not all hyphens are created equal apparently. Filed away in the tip jar. Cheers Kevin.