Extra header column stopping EMMAX (C), but not seen in input files
0
0
Entering edit mode
3.6 years ago
michael.nagle ▴ 100

I'm using a C implementation of EMMAX for GWAS and am getting an error that I can't get to the bottom of because I don't know enough about C. I hope there's a C expert who can take a quick look at this and let me know which of my input files has the problem, and what is causing the error.

Input:

emmax -v -d 10 -t [tped/tfam file prefix] -p [phenotype file] -k [kinship matrix] -o [output]


Source code: Actual code can be downloaded here (http://csg.sph.umich.edu//kang/emmax/download/index.html) and equivalent code for a slightly different version that can give the same error is on Github (https://github.com/slowkoni/EPI-EMMAX)

There's a problem with the variable nheadercols that leads to the error I will show below.

Input files:
Top 5 rows, 7 columns of .tfam file (tab-delimited):
1 . 0 67 0 0 C
1 . 0 92 0 0 0
1 . 0 95 0 0 0
1 . 0 102 0 0 0
1 . 0 103 0 0 0

Kinship matrix looks as expected, with 1.00 down diagonal because every genotype has perfect kinship with itself, # rows and columns match the number of genotypes in the phenotype and .tfam files.

I've tried phenotype files with and without an extra column of genotype labels (shown below as first column, with no extra) (everything is tab delimited)... the two genotype IDs in the first two columns are the same for this population.
PhenolabelA1 PhenolabelA2 NA
PhenolabelB1 PhenolabelB2 0
PhenolabelC1 PhenolabelC2 1
PhenolabelD1 PhenolabelD2 0
PhenolabelE1 PhenolabelE2 NA

Standard out:

Reading TFAM file [tfam/tped prefix].tfam ....

882 rows and 882 columns were observed with 0 missing values.

Reading the phenotype file [phenotype input].txt...

ERROR: Number of header columns are 2, but only 1 columns were observed


Thanks much for helping me decipher this!

C++ emmax plink GWAS genomics • 1.2k views
0
Entering edit mode

This has probably nothing to do with the code but with the way the file is formatted. If the file is expected to be tab-delimited, check that spaces haven't been used instead of tabs.

0
Entering edit mode

The code works. I'm trying to look at the code to figure out what the problem with input is. The files are tab-delimited text as per EMMAX instructions.

0
Entering edit mode

Just to clarify, there's most likely nothing to be learned from the code. It expects tab-delimited input but your input file is not fully tab-delimited. This is what the error message suggests. The most common cause for this kind of things is when tab characters are replaced by some other white space characters. You can check with a perl one-liner if your input file is really tab-delimited with the expected number of columns, e.g. checking for 7 columns:

perl -ne '@row = split(/\t/); $n++;$l = @row; if ($l != 6) { print "Line$n has $l columns and is probably not tab-delimited.\n";}; END{print "Done. Checked$n lines.\n";}' input.txt

0
Entering edit mode

This one-liner says the input files are tab-delimited and I also made sure by replacing all tabs with \$ and by double checking the perl code used to format the tab-delimited input files.