Getting nan value in vcftool relatedness command?
0
0
Entering edit mode
6.3 years ago
Sharon ▴ 610

Hi all

I understand that snp array is better for proving relatedness, but we have 2 vcfs generated from exom data, and we need to show if there is any relatedness between them.

I tried these two commands from vcftools and both get the same output:

vcftools --vcf p1p2.vcf --relatedness
vcftools --vcf p1p2.vcf --relatedness2

The output is:

INDV1   INDV2   RELATEDNESS_AJK 
 P1       P1       -nan 
 P1       P2       -nan 
 P2       P2       -nan

Any hint why I am getting nans?

Thanks

vcftools relatedness • 4.0k views
ADD COMMENT
0
Entering edit mode

Can you paste the header of your VCF(s) and also some of the variants?

ADD REPLY
0
Entering edit mode

Thanks Kevin, here it is:

##fileformat=VCFv4.2
##fileOrigin=/New_CLC_Server/New_NGS_Project2/SNPs/SNPs2/SNPs3/SNPs4/SNPs5/SNPs6/p1  
##fileDate=20170315
##fileEncoding=windows-1252
##fileOrigin=/New_CLC_Server/New_NGS_Project2//SNPs/SNPs2/SNPs3/SNPs4/SNPs5/SNPs6/p2
##source=CLC Genomics Workbench 9.0.1 build 20160606111333
##reference=/CLC_Server/ReferenceGenome/Human/Homo sapiens (hg19) sequence
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Total number of filtered reads per sample used by variant caller">
##FORMAT=<ID=CLCAD2,Number=.,Type=Integer,Description="Allelic depth, number of filtered reads supporting the alleles where the first element represents the reference and subsequent elements represent the alternatives in the order listed in the ALT column">
##FORMAT=<ID=VF,Number=1,Type=Float,Description="Variant frequency, the percentage of reads supporting the alternate allele">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depth, unfiltered count of all reads">
##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
##FORMAT=<ID=RO,Number=1,Type=Integer,Description="Reference allele observation count">
##source_20180202.1=vcf-merge(v0.1.14-12-gcdb80b8) p1.vcf.gz p2.vcf.gz
##sourceFiles_20180202.1=0:p1.vcf.gz,1:p2vcf.gz
##INFO=<ID=SF,Number=.,Type=String,Description="Source File (index to sourceFiles, f when filtered)">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  p1       p2
1       69511   rs2691305       A       G       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:174:0,174     1:114:0,114
1       762345  rs74045218      A       G       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .       1:111:0,6
1       866527  rs192246959     A       T       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .       1:15:0,4
1       874816  rs200996316     C       CT      .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:27:0,13       .
1       877831  rs6672356       T       C       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:9:0,9 1:13:0,13
1       883625  rs4970378       A       G       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:54:0,54       1:63:0,63
1       884091  rs7522415       C       G,N     .       .       AC=2,1;AN=3;SF=0,1      GT:DP:CLCAD2    1/2:92:0,19,10  1:105:0,17
1       884101  rs4970455       A       C,N     .       .       AC=2,1;AN=3;SF=0,1      GT:DP:CLCAD2    1/2:74:0,47,7   1:86:0,23
ADD REPLY
0
Entering edit mode

Hi Sharon! Okay, let me take a look in a while. I can already see potential problems.

ADD REPLY
0
Entering edit mode

Sure, thanks Kevin. Take your time. Much appreciated. Thanks so much.

ADD REPLY
1
Entering edit mode

Could you try the following steps:

  1. gzip the VCF using gzip MyVars.vcf
  2. tab-index using tabix -p vcf MyVars.vcf.gz
  3. Normalise and split multi-alleles using bcftools norm -m-any -Oz MyVars.vcf.gz > MyVars.norm.vcf.gz

Then, retry the relatedness part.

ADD REPLY
0
Entering edit mode

Thanks Kevin, I tried this now:

${TABIX}/bgzip -c myvcf.vcf > myvcf.vcf.gz
${TABIX}/tabix -p vcf myvcf.vcf.gz
bcftools norm -m-any -Ozmyvcf.vcf.gz > norm.vcf.gz
vcftools --gzvcf norm.vcf.gz --relatedness

Just gzip doesn't work and I had to change to --gzvcf, I got this:

Parameters as interpreted: --gzvcf norm.vcf.gz --relatedness

Using zlib version: 1.2.7 After filtering, kept 2 out of 2 Individuals Outputting Individual Relatedness Relatedness: Only using fully diploid sites. Error: Polyploidy found, and not supported by vcftools: 1:17086003 make: * [out.relatedness] Error 1

ADD REPLY
1
Entering edit mode

Oh, some amount of progress... and, apologies, yes, we use bgzip and not gzip for this.

Can you take a look at the site 1:17086003 just to see what's there? You may have to manually remove it but, better to check the variant call to see what exactly is happening.

You could do something quick like bcftools view norm.vcf.gz | grep -e "17086003"

ADD REPLY
0
Entering edit mode

Thanks Kevin so much. I keep getting positions that the error says are ployploidy that looks like this:

1       79358743        rs10640737      CA      C       .       .       AC=1,1,1,0;AN=3;SF=0,1  GT:DP:CLCAD2    0/0/0:58:0,5,5,4        1:92:0,19
1       79358743        rs10640737      CA      CAA     .       .       AC=1,1,1,0;AN=3;SF=0,1  GT:DP:CLCAD2    1/0/0:58:0,5,5,4        0:92:0,19
1       79358743        rs10640737      CA      CAAA    .       .       AC=1,1,1,0;AN=3;SF=0,1  GT:DP:CLCAD2    0/1/0:58:0,5,5,4        0:92:0,19
1       79358743        rs10640737      CA      CAAAA   .       .       AC=1,1,1,0;AN=3;SF=0,1  GT:DP:CLCAD2    0/0/1:58:0,5,5,4        0:92:0,19
1       79358743        rs10640737      C       CA      .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:84:0,8

I keep removing and I keep getting more, they seem too much. Should I continue doing this manully?
I am also not sure why they show up, this is human? How do you think? Thanks

ADD REPLY
1
Entering edit mode

How did you call these variants? To me, this looks like issues with the variant caller (or aligner in aligning reads to a homopolymeric region).

If you want a quick fix just for the relatedness command, then use bcftools in order to only retain a maximum of 2 alleles (ref / alt) at each site: how to remove multiallelic from VCF (I assume that it will just retain the first listed in the VCF).

ADD REPLY
1
Entering edit mode

I did not do the call, they are old files in the lab, generated 5 years ago or so, and professor wanted to check relatedness lately. So, I will try that step of removing multiallelic from VCF.

Thanks Kevin so much, very much appreciated.

ADD REPLY
0
Entering edit mode

I removed the multiallelic from vcf, I am back now to the original issue :(

After filtering, kept 2 out of 2 Individuals Outputting Individual Relatedness Relatedness: Only using fully diploid sites. After filtering, kept 40993 out of a possible 40993 Sites Run Time = 0.00 seconds

INDV1   INDV2   N_AaAa  N_AAaa  N1_Aa   N2_Aa   RELATEDNESS_PHI
p11_001       p11_001       0       0       0       0       -nan
p11_001      p21_001       0       0       0       0       -nan
p21_001       p21_001       0       0       0       0       -nan
p21_001      p11_001       0       0       0       0       -nan
ADD REPLY
1
Entering edit mode

That's strange because it kept all 40993 sites in your file this time. Where are your samples from Do you expect relatedness?

I ran this on monozygotic twins and other brohter-pairs in the past, by the way, and the metric does work. There could be something quirky about your VCF..

If you want to paste some of it again, I can check, or upload it somewhere online so that I can try.

ADD REPLY
0
Entering edit mode

Thanks Kevin. Yes, it works with me too on other data we have. I am also surprised it is like this here. I posted my question just in case there is an issue I did not see. But yes, it works with other stuff I have.

It should be a father/son relationship. But for some other evidence, we claim that there is no relatedness between them. So it is weird/risky question too ! Here is a sample:

##bcftools_normCommand=norm -m-any -Oz norm.vcf.gz; Date=Mon Feb 12 12:28:50 2018
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  pR1_001       p2R1_001
1       69511   rs2691305       A       G       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:174:0,174     1:114:0,114
1       762345  rs74045218      A       G       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:111:0,6
1       866527  rs192246959     A       T       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:15:0,4
1       874816  rs200996316     C       CT      .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:27:0,13       .:.:.
1       877831  rs6672356       T       C       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:9:0,9 1:13:0,13
1       883625  rs4970378       A       G       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:54:0,54       1:63:0,63
1       884091  rs7522415       C       G       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:105:0,17
1       884101  rs4970455       A       C       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:86:0,23
1       897465  rs767043015     A       C       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:50:0,11       .:.:.
1       902177  rs565222190     G       T       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:81:0,44
1       912103  rs7366480       G       A       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:12:0,5        1:10:0,10
1       970549  rs70949545      TGG     T       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:12:0,8        1:13:0,7
1       976514  rs79290478      C       A       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:9:0,9 1:10:0,10
1       984147  .       CA      C       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:10:0,3
1       985443  rs71576592;rs774490460  AG      A       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:9:0,4
1       986545  .       C       N       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:27:0,3        .:.:.
1       986549  .       T       N       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:20:0,3        .:.:.
1       989815  .       C       A       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:42:0,3        .:.:.
1       1019565 .       G       A       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:89:0,42       1:81:0,38
1       1103471 rs12732261      C       T       .       .       AC=1;AN=1;SF=1  GT:DP:CLCAD2    .:.:.   1:21:0,5
1       1104316 .       C       T       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:24:0,3        .:.:.
1       1139887 .       A       G       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:24:0,3        .:.:.
1       1196863 rs6659787       T       C       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:26:0,26       1:35:0,35
1       1229346 .       C       N       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:45:0,3        .:.:.
1       1231203 rs143854090     G       A       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:96:0,40       .:.:.
1       1257286 .       A       C       .       .       AC=1;AN=1;SF=0  GT:DP:CLCAD2    1:39:0,6        .:.:.
1       1271420 rs181665524     G       A       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:30:0,13       1:43:0,24
1       1273614 .       T       C       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:31:0,4        1:24:0,6
1       1273622 .       G       N       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:36:0,5        1:46:0,4
1       1275264 rs12037363      A       G       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:50:0,4        1:68:0,4
1       1275291 rs11582808      G       A       .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:30:0,23       1:43:0,39
1       1275329 rs11585063;rs11585534   TA      CG      .       .       AC=2;AN=2;SF=0,1        GT:DP:CLCAD2    1:10:0,5        1:15:0,9
1       1290592 .       C
ADD REPLY
1
Entering edit mode

Oh!, there you go (I think...)! There are no properly encoded genotypes? I just see a '1' and not 1/1, 0/1, etc. What happened there?

ADD REPLY
0
Entering edit mode

Oh, yes, THANK YOU KEVIN, did not notice. If this is a sample of the original vcf before merging or anything, then this vcf has to be regenerated, right? I still see "1" instead of "1/0" or "1/1" in some cases. Thanks ALOT

1       884101  rs4970455       A       C       .       .       .       GT:CLCAD2:DP    1:0,32:90
1       897460  .       A       C,N     .       .       .       GT:CLCAD2:DP    1/2:0,14,5:64
1       897465  rs767043015     A       C       .       .       .       GT:CLCAD2:DP    1:0,12:60
1       897470  rs765256819     A       C,N     .       .       .       GT:CLCAD2:DP    1/2:0,12,7:69
1       902177  rs565222190     G       T       .       .       .       GT:CLCAD2:DP    1:0,30:56
1       912103  rs7366480       G       A       .       .       .       GT:CLCAD2:DP    1:0,6:6
1       970549  rs70949545      TGG     T       .       .       .       GT:CLCAD2:DP    1:0,6:11
1       976514  rs79290478      C       A       .       .       .       GT:CLCAD2:DP    1:0,5:6
1       985443  rs71576592;rs774490460  AG      A       .       .       .       GT:CLCAD2:DP    1:0,7:13
1       986545  .       C       N       .       .       .       GT:CLCAD2:DP    1:0,4:35
1       986549  .       T       G,N     .       .       .       GT:CLCAD2:DP    1/2:0,3,3:27
ADD REPLY
1
Entering edit mode

Yes, where did you generate it? Custom annotations are permitted in the VCF format but not for the GT, I think.

ADD REPLY
1
Entering edit mode

I did not generate the original vcfs, I was given them. i just merged them. I will ask for the fastq if possible to regenerate and double check this issue.

Thanks so much Kevin. VERY MUCH APPRECIATED.

ADD REPLY

Login before adding your answer.

Traffic: 2155 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6