I am using FastQTL (http://fastqtl.sourceforge.net/) for eQTL mapping in our genotyping (~20K SNPs) & RNA-Seq data. It basically uses a VCF file for the genotypes and a BED file for gene expression levels.
For genome-wide analysis it works very well so far; but I am missing information for some regions when it gives errors as it uses the GSL for maximum likelihood estimations during the permutation step. The error is as below:
As I said, it doesn't occur frequently and for all the genetic regions. I checked the values in VCF and BED files many times, and they all seem OK. I analyze the genome in chunks, and if this error occurs in one of the chunks, I lose all the information in that chunk.
Since it says ''beta'', I think it might be about the beta functions, but I couldn't find more information to fix the problem. Any suggestions on this error?
Can someone please explain in detail the meaning of the following section of the paper
2.2 Finding a candidate QTL per phenotype For simplicity, we will focus on a single molecular phenotype P quantified in a set of N
samples. Let G be the set of genotype dosages at L variant sites
located within a cis-window of 6 W Mb of the genomic location of P. To
discover the best candidate QTL for P, FastQTL measures Pearson
product-moment correlation coefficients between P and all L variants
in G, stores the most strongly correlated variant q [ G as candidate
QTL, and assesses its significance by calculating a nominal P-value pn
with standard significance tests for Pearson correlation.
How can this be affected by a test where only one SNP and its LD proxies (r=0.8) is tested.
Thank you very much for your answer mobius, I have a follow-up question. If I'm only testing a restricted number of SNPs going into the analysis e.g 240 will program computed the BETA p-value based on the number of SNPs in my vcf file (0.05/240= 0.0002083333 )? Or in some way compute a genome-wide BETA p-value?
Most likely your parameters for the beta distribution are wrong i.e. negative. This could be due to numerical stability issues in the algorithm.
I don't know how FastQTL works but from the paper:
so another possibility is that the beta distribution is not a good approximation for those cases.
Thank you very much for your reply. Indeed it could be due to the negative parameters, but as far as I know beta parameters are estimated using permutations (in FastQTL case, an adaptive permutation scheme) for generating a set of P-values (and using these values in a log-likelihood equation). So I don't think it would generate negative parameters. Maybe I can try updating the latest version of GSL for a better numerical stability.
Another option might be (somehow) disabling the ''default GSL error handler'' that aborts the computation for the related chunk and deletes all other information, what do you think?
From a quick look at the source code of beta.c, it fails because the function gsl_sf_lnbeta_sgn_e returns -1. This functions appears to check the sign and the amplitude of the parameters so this confirms that it's failing because FastQTL can't find adequate beta distribution parameters. You could suppress the error but would you trust the results ?
Thanks again for your help! Source code of beta.c explains all, you were right. Also I managed to contact the author of FastQTL paper, he also says that it's probably due to the low variability of some of the transcript expression levels between the samples. I've checked the failing ones, and I can confirm this. It's either due to many 0 values, or almost same expression levels for all samples.
Maybe if I can filter out these transcripts, the tool would work properly.
And regarding your last remark: I don't think this error is occurring for a big portion of my phenotype data, so it would be still OK to suppress the error somehow. As I said, even if it occurs for one transcript, it deletes all other mapped eQTLs in the same chunk.
What did you end up using as a workaround? Did you filter out the transcripts? I'm encountering the same issue.
Hi! Yes, I filtered out the transcripts that have low variability (= expression values of 0 for all the individuals) with an awk one-liner. Same goes for the genotypes - if they are always the same genotype for a variant - filter that variant as well. Then the tool works without an error. That wouldn't affect your results or create bias, since you just remove non-variable genotypes/phenotypes.
Let me know if that doesn't solve your problem, I will be happy to help further.
This solution is really helpful. I was running using filtered data, but then I used all the coding genes and got this error, so I think that this has something to do with lowly expressed data.
However, maybe you could try nominal option instead of permutation.
FastQTL (permutation pass) is displaying the following error message:
gsl: beta.c:44: ERROR: domain error Default GSL error handler invoked. Aborted
Any help will be appreciated.