help with vcftools filtering on individuals
1
2
Entering edit mode
3.8 years ago
schisler ▴ 30

Hi all, new to vcftools. I have a vcf file that I need to 1) filter via sample ID and 2) run frequency and LD analysis.

I have no problems with the --freq and --hap-r2 on the vcf file, however, I can't seem to get the filtering on sample ID to work?

I tried using a txt file with 950 sample IDs I want to keep:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --keep AA_12M.txt

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf AA_TAS2R38_HAPLO.vcf
    --keep AA_12M.txt
    --freq
    --out AA_HAPLO_12M_freq

Keeping individuals in 'keep' list
After filtering, kept 1 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds

So I'm not sure what's happening? My "keep" file is just one sample name per line in a txt file. If I just use the --remove-indv it works just fine:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --remove-indv 4_1969-01D_02-013-1

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf AA_TAS2R38_HAPLO.vcf
    --freq
    --out AA_HAPLO_12M_freq
    --remove-indv 4_1969-01D_02-013-1

Excluding individuals in 'exclude' list
After filtering, kept 1304 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds

This makes me think it is the format of my keep file? Or am I missing something in the header somewhere?

Thanks all!

NitDawg

vcftools • 5.8k views
ADD COMMENT
2
Entering edit mode

I can't be sure but it could be due to the end-line character in your keep text file. In which editor and operating system did you create it?

In the past, I have had issues with text files created in Windows that I then transported to linux. There's a very simple program called dos2unix that could assist. Here it is for ubuntu: https://launchpad.net/ubuntu/trusty/+package/dos2unix

Also check those hypens in your keep file and be sure that they are exactly the same as in the VCF. I have also had issues time and time again with different types of hyphens - there are various in the ASCII encoding.

ADD REPLY
0
Entering edit mode

Yup! That was it, the encoding.

This is on an OSX machine using TextMate. The default was Unicode - UTF-8. I switched to Western - Windows and viola.

Thanks a ton. I banged my head for an afternoon.

ADD REPLY
0
Entering edit mode

No problem. Yes, you wouldn't believe how much this issue frustrated me the first time that I encountered it. I think that it was back in 2014.

Good luck! Kevin

ADD REPLY
1
Entering edit mode

Not all hyphens are created equal apparently. Filed away in the tip jar. Cheers Kevin.

ADD REPLY
1
Entering edit mode
23 months ago
pduluth96 ▴ 10

For anyone that has a similar problem ( I had it reversed: keep worked, but remove-indv didn't), remember that you can just give the name of the individual you want to keep / exclude, instead of giving a text file containing its name: e.g.

vcftools --gzvcf my_vcf.vcf.gz --remove-indv S12794  --recode

EDIT: apparently this only works with --remove-indv, not with --keep.

ADD COMMENT

Login before adding your answer.

Traffic: 2162 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6