help with vcftools filtering on individuals
1
2
Entering edit mode
3.8 years ago
schisler ▴ 30

Hi all, new to vcftools. I have a vcf file that I need to 1) filter via sample ID and 2) run frequency and LD analysis.

I have no problems with the --freq and --hap-r2 on the vcf file, however, I can't seem to get the filtering on sample ID to work?

I tried using a txt file with 950 sample IDs I want to keep:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --keep AA_12M.txt

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--vcf AA_TAS2R38_HAPLO.vcf
--keep AA_12M.txt
--freq
--out AA_HAPLO_12M_freq

Keeping individuals in 'keep' list
After filtering, kept 1 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds


So I'm not sure what's happening? My "keep" file is just one sample name per line in a txt file. If I just use the --remove-indv it works just fine:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --remove-indv 4_1969-01D_02-013-1

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
--vcf AA_TAS2R38_HAPLO.vcf
--freq
--out AA_HAPLO_12M_freq
--remove-indv 4_1969-01D_02-013-1

Excluding individuals in 'exclude' list
After filtering, kept 1304 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds


This makes me think it is the format of my keep file? Or am I missing something in the header somewhere?

Thanks all!

NitDawg

vcftools • 5.8k views
2
Entering edit mode

I can't be sure but it could be due to the end-line character in your keep text file. In which editor and operating system did you create it?

In the past, I have had issues with text files created in Windows that I then transported to linux. There's a very simple program called dos2unix that could assist. Here it is for ubuntu: https://launchpad.net/ubuntu/trusty/+package/dos2unix

Also check those hypens in your keep file and be sure that they are exactly the same as in the VCF. I have also had issues time and time again with different types of hyphens - there are various in the ASCII encoding.

0
Entering edit mode

Yup! That was it, the encoding.

This is on an OSX machine using TextMate. The default was Unicode - UTF-8. I switched to Western - Windows and viola.

Thanks a ton. I banged my head for an afternoon.

0
Entering edit mode

No problem. Yes, you wouldn't believe how much this issue frustrated me the first time that I encountered it. I think that it was back in 2014.

Good luck! Kevin

1
Entering edit mode

Not all hyphens are created equal apparently. Filed away in the tip jar. Cheers Kevin.

1
Entering edit mode
23 months ago
pduluth96 ▴ 10

For anyone that has a similar problem ( I had it reversed: keep worked, but remove-indv didn't), remember that you can just give the name of the individual you want to keep / exclude, instead of giving a text file containing its name: e.g.

vcftools --gzvcf my_vcf.vcf.gz --remove-indv S12794  --recode


EDIT: apparently this only works with --remove-indv, not with --keep.