Question: help with vcftools filtering on individuals
2
gravatar for schisler
2.2 years ago by
schisler30
schisler30 wrote:

Hi all, new to vcftools. I have a vcf file that I need to 1) filter via sample ID and 2) run frequency and LD analysis.

I have no problems with the --freq and --hap-r2 on the vcf file, however, I can't seem to get the filtering on sample ID to work?

I tried using a txt file with 950 sample IDs I want to keep:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --keep AA_12M.txt

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf AA_TAS2R38_HAPLO.vcf
    --keep AA_12M.txt
    --freq
    --out AA_HAPLO_12M_freq

Keeping individuals in 'keep' list
After filtering, kept 1 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds

So I'm not sure what's happening? My "keep" file is just one sample name per line in a txt file. If I just use the --remove-indv it works just fine:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --remove-indv 4_1969-01D_02-013-1

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf AA_TAS2R38_HAPLO.vcf
    --freq
    --out AA_HAPLO_12M_freq
    --remove-indv 4_1969-01D_02-013-1

Excluding individuals in 'exclude' list
After filtering, kept 1304 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds

This makes me think it is the format of my keep file? Or am I missing something in the header somewhere?

Thanks all!

NitDawg

vcftools • 3.4k views
ADD COMMENTlink modified 5 months ago by pduluth9610 • written 2.2 years ago by schisler30
2

I can't be sure but it could be due to the end-line character in your keep text file. In which editor and operating system did you create it?

In the past, I have had issues with text files created in Windows that I then transported to linux. There's a very simple program called dos2unix that could assist. Here it is for ubuntu: https://launchpad.net/ubuntu/trusty/+package/dos2unix

Also check those hypens in your keep file and be sure that they are exactly the same as in the VCF. I have also had issues time and time again with different types of hyphens - there are various in the ASCII encoding.

ADD REPLYlink modified 2.2 years ago • written 2.2 years ago by Kevin Blighe52k

Yup! That was it, the encoding.

This is on an OSX machine using TextMate. The default was Unicode - UTF-8. I switched to Western - Windows and viola.

Thanks a ton. I banged my head for an afternoon.

ADD REPLYlink written 2.2 years ago by schisler30

No problem. Yes, you wouldn't believe how much this issue frustrated me the first time that I encountered it. I think that it was back in 2014.

Good luck! Kevin

ADD REPLYlink written 2.2 years ago by Kevin Blighe52k
1

Not all hyphens are created equal apparently. Filed away in the tip jar. Cheers Kevin.

ADD REPLYlink written 2.2 years ago by schisler30
1
gravatar for pduluth96
5 months ago by
pduluth9610
pduluth9610 wrote:

For anyone that has a similar problem ( I had it reversed: keep worked, but remove-indv didn't), remember that you can just give the name of the individual you want to keep / exclude, instead of giving a text file containing its name: e.g.

vcftools --gzvcf my_vcf.vcf.gz --remove-indv S12794  --recode

EDIT: apparently this only works with --remove-indv, not with --keep.

ADD COMMENTlink modified 5 months ago • written 5 months ago by pduluth9610
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1348 users visited in the last hour