Question: Extra header column stopping EMMAX (C), but not seen in input files
0
gravatar for michael.nagle
14 months ago by
michael.nagle100
michael.nagle100 wrote:

I'm using a C implementation of EMMAX for GWAS and am getting an error that I can't get to the bottom of because I don't know enough about C. I hope there's a C expert who can take a quick look at this and let me know which of my input files has the problem, and what is causing the error.

Input:

emmax -v -d 10 -t [tped/tfam file prefix] -p [phenotype file] -k [kinship matrix] -o [output]

Source code: Actual code can be downloaded here (http://csg.sph.umich.edu//kang/emmax/download/index.html) and equivalent code for a slightly different version that can give the same error is on Github (https://github.com/slowkoni/EPI-EMMAX)

There's a problem with the variable nheadercols that leads to the error I will show below.

Input files:
Top 5 rows, 7 columns of .tfam file (tab-delimited):
1 . 0 67 0 0 C
1 . 0 92 0 0 0
1 . 0 95 0 0 0
1 . 0 102 0 0 0
1 . 0 103 0 0 0

Kinship matrix looks as expected, with 1.00 down diagonal because every genotype has perfect kinship with itself, # rows and columns match the number of genotypes in the phenotype and .tfam files.

I've tried phenotype files with and without an extra column of genotype labels (shown below as first column, with no extra) (everything is tab delimited)... the two genotype IDs in the first two columns are the same for this population.
PhenolabelA1 PhenolabelA2 NA
PhenolabelB1 PhenolabelB2 0
PhenolabelC1 PhenolabelC2 1
PhenolabelD1 PhenolabelD2 0
PhenolabelE1 PhenolabelE2 NA

Standard out:

Reading TFAM file [tfam/tped prefix].tfam ....


Reading kinship file [prefix].kinf...

  882 rows and 882 columns were observed with 0 missing values.


Reading the phenotype file [phenotype input].txt...

ERROR: Number of header columns are 2, but only 1 columns were observed

Thanks much for helping me decipher this!

plink genomics emmax c++ gwas • 509 views
ADD COMMENTlink modified 13 months ago • written 14 months ago by michael.nagle100

This has probably nothing to do with the code but with the way the file is formatted. If the file is expected to be tab-delimited, check that spaces haven't been used instead of tabs.

ADD REPLYlink written 14 months ago by Jean-Karim Heriche18k

The code works. I'm trying to look at the code to figure out what the problem with input is. The files are tab-delimited text as per EMMAX instructions.

ADD REPLYlink written 14 months ago by michael.nagle100

Just to clarify, there's most likely nothing to be learned from the code. It expects tab-delimited input but your input file is not fully tab-delimited. This is what the error message suggests. The most common cause for this kind of things is when tab characters are replaced by some other white space characters. You can check with a perl one-liner if your input file is really tab-delimited with the expected number of columns, e.g. checking for 7 columns:

perl -ne '@row = split(/\t/); $n++; $l = @row; if ($l != 6) { print "Line $n has $l columns and is probably not tab-delimited.\n";}; END{print "Done. Checked $n lines.\n";}' input.txt
ADD REPLYlink written 13 months ago by Jean-Karim Heriche18k

This one-liner says the input files are tab-delimited and I also made sure by replacing all tabs with $ and by double checking the perl code used to format the tab-delimited input files.

ADD REPLYlink modified 13 months ago • written 13 months ago by michael.nagle100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1194 users visited in the last hour