Question: "Unrecognized token: C" error when using emmax-kin to make kinship matrix for GWAS
0
gravatar for michael.nagle
8 months ago by
michael.nagle90 wrote:

I first obtained .tped and .tfam files from a .vcf genotype file for our GWAS population, using PLINK. I'm now trying to use the .tped, .tfam files to make a kinship matrix with EMMAX.

For some reason, I'm getting this error, which I'm not familiar with and I can't find any relevant discussion about this online.

Input: emmax-kin -v -s -d 10 [prefix for input .tped and .tfam]

The input files (obtained via PLINK) appear to be consistent with how .tped and .tfam files are supposed to look: https://www.cog-genomics.org/plink2/formats

Bottom 5 rows, first 12 columns of input .tped file:
scaffold_338 . 0 19212 0 0 0 0 0 0 0 0
scaffold_338 . 0 19274 0 0 0 0 0 0 0 0
scaffold_338 . 0 19312 0 0 0 0 0 0 0 0
scaffold_338 . 0 19426 0 0 0 0 0 0 T T
scaffold_338 . 0 19428 0 0 0 0 0 0 C C

Bottom 5 rows of input .tfam file:
852 1015268 0 0 0 -9
852 1015271 0 0 0 -9
852 1015274 0 0 0 -9
852 1015277 0 0 0 -9
852 1015280 0 0 0 -9

Output:
Reading TFAM file [my input file prefix].tfam ....
Reading TPED file [my input file prefix].tped ....
Unrecognized token C

Desired output: A .kinf file (kinship matrix)

I'm at a loss of how to address this problem, so help is greatly appreciated. Thanks for your time and help.

genomics emmax kinship gwas • 535 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by michael.nagle90
1

No experience with emmax-kin, but the source code pasted below suggests it might expect genotypes encoded as 0,1,2, but it encountered the the letter base code 'C' in your TPED. Is that making any sense?

== emmax-kin.c lines 430-

// if zero_miss_flag is set, assume the genotypes are encoded 0,1,2
    // Additively encodes the two genotypes in the following way
    // when (j-nheadercols) is even, 0->MISSING, add 1->0, 2->1
    // when (j-nheadercols) is odd, check 0-0 consistency, and add 1->0, 2->1
    else {
      ctoken = (unsigned char)(token[0]-'0');

      if ( ctoken > 2 ) {
        fprintf(stderr,"Unrecognized token %s\n",token);
        abort();
      }

== end code
ADD REPLYlink written 8 months ago by Ahill1.2k

I've looked at this part of the source code alone and in the broader context, and don't understand why it would want genotypes encoded as 0, 1 or 2 (or how this is possible) when a .tped file has G/A/T/K/0 for each.

Hope somebody can clarify...

ADD REPLYlink written 8 months ago by michael.nagle90
1

When generating your .tped, did you use the PLINK --recode12 --output-missing-genotype 0 options? I'm going from the EMMAX web page:

https://genome.sph.umich.edu/wiki/EMMAX#Preparing_Input_Genotype_Files http://zzz.bwh.harvard.edu/plink/dataman.shtml#recode

--recode12 will recode the alleles as 1 and 2.

ADD REPLYlink modified 8 months ago • written 8 months ago by Ahill1.2k

This solved the problem. Thank you!

ADD REPLYlink written 8 months ago by michael.nagle90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1941 users visited in the last hour