Question: help with vcftools filtering on individuals
1
gravatar for schisler
18 months ago by
schisler20
schisler20 wrote:

Hi all, new to vcftools. I have a vcf file that I need to 1) filter via sample ID and 2) run frequency and LD analysis.

I have no problems with the --freq and --hap-r2 on the vcf file, however, I can't seem to get the filtering on sample ID to work?

I tried using a txt file with 950 sample IDs I want to keep:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --keep AA_12M.txt

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf AA_TAS2R38_HAPLO.vcf
    --keep AA_12M.txt
    --freq
    --out AA_HAPLO_12M_freq

Keeping individuals in 'keep' list
After filtering, kept 1 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds

So I'm not sure what's happening? My "keep" file is just one sample name per line in a txt file. If I just use the --remove-indv it works just fine:

vcftools --vcf AA_TAS2R38_HAPLO.vcf --freq --out AA_HAPLO_12M_freq --remove-indv 4_1969-01D_02-013-1

VCFtools - 0.1.15
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --vcf AA_TAS2R38_HAPLO.vcf
    --freq
    --out AA_HAPLO_12M_freq
    --remove-indv 4_1969-01D_02-013-1

Excluding individuals in 'exclude' list
After filtering, kept 1304 out of 1305 Individuals
Outputting Frequency Statistics...
After filtering, kept 1 out of a possible 1 Sites
Run Time = 0.00 seconds

This makes me think it is the format of my keep file? Or am I missing something in the header somewhere?

Thanks all!

NitDawg

vcftools • 2.3k views
ADD COMMENTlink modified 18 months ago • written 18 months ago by schisler20
1

I can't be sure but it could be due to the end-line character in your keep text file. In which editor and operating system did you create it?

In the past, I have had issues with text files created in Windows that I then transported to linux. There's a very simple program called dos2unix that could assist. Here it is for ubuntu: https://launchpad.net/ubuntu/trusty/+package/dos2unix

Also check those hypens in your keep file and be sure that they are exactly the same as in the VCF. I have also had issues time and time again with different types of hyphens - there are various in the ASCII encoding.

ADD REPLYlink modified 18 months ago • written 18 months ago by Kevin Blighe39k

Yup! That was it, the encoding.

This is on an OSX machine using TextMate. The default was Unicode - UTF-8. I switched to Western - Windows and viola.

Thanks a ton. I banged my head for an afternoon.

ADD REPLYlink written 18 months ago by schisler20

No problem. Yes, you wouldn't believe how much this issue frustrated me the first time that I encountered it. I think that it was back in 2014.

Good luck! Kevin

ADD REPLYlink written 18 months ago by Kevin Blighe39k
1

Not all hyphens are created equal apparently. Filed away in the tip jar. Cheers Kevin.

ADD REPLYlink written 18 months ago by schisler20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1758 users visited in the last hour