Entering edit mode
10.6 years ago
lhusselmann
▴
20
I'm experiencing the following problem using the pre-compiled binary packages of Tophat-2.0.9 and Cufflinks-2.1.1 where Cuffdiff was reporting many genes and transcripts as having expression levels of zero or "nan" or "inf". Any suggestions as to what I should do?
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
XLOC_008530 XLOC_008530 - MDC006635.211:38544-38620 C0 E0 OK 30221.4 1.11591e+07 8.52844 903.902 5e-05 0.0176643 yes
XLOC_011890 XLOC_011890 - MDC009345.248:7633-8811 C0 E0 OK 0 273.11 inf nan 5e-05 0.0176643 yes
XLOC_014073 XLOC_014073 - MDC010817.271:25183-25478 C0 E0 OK 0 140.21 inf nan 0.00015 0.0417085 yes
XLOC_019038 XLOC_019038 - MDC015012.71:1213-1481 C0 E0 OK 0 178.129 inf nan 0.00015 0.0417085 yes
XLOC_020039 XLOC_020039 - MDC015910.528:10447-10548 C0 E0 OK 12325.1 2.3107e+06 7.55059 378.977 5e-05 0.0176643 yes
XLOC_023582 XLOC_023582 - MDC019007.400:2447-2753 C0 E0 OK 0 644.473 inf nan 5e-05 0.0176643 yes
XLOC_024891 XLOC_024891 - MDC020182.198:2666-2981 C0 E0 OK 0 171.57 inf nan 5e-05 0.0176643 yes
XLOC_025035 XLOC_025035 - MDC020310.146:5206-5628 C0 E0 OK 0 53.5601 inf nan 0.0001 0.0322248 yes
XLOC_025460 XLOC_025460 - MDC020722.137:4959-5145 C0 E0 OK 0 706.613 inf nan 0.00015 0.0417085 yes
XLOC_026573 XLOC_026573 - MDC021889.237:5595-5690 C0 E0 OK 9753.29 3.7682e+06 8.59377 482.263 5e-05 0.0176643 yes
XLOC_027081 XLOC_027081 - MDC022358.373:27042-27378 C0 E0 OK 0 210.996 inf nan 5e-05 0.0176643 yes
XLOC_000015 XLOC_000015 - MDC000017.398:2019-2460 C0 C2 OK 0 77.279 inf nan 5e-05 0.0176643 yes
XLOC_000080 XLOC_000080 - MDC000071.210:1371-1990 C0 C2 OK 0 782.288 inf nan 5e-05 0.0176643 yes
XLOC_000924 XLOC_000924 - MDC000691.163:12249-13292 C0 C2 OK 0 26.0045 inf nan 5e-05 0.0176643 yes
XLOC_001030 XLOC_001030 - MDC000760.256:22135-23927 C0 C2 OK 0 47.105 inf nan 5e-05 0.0176643 yes
XLOC_001274 XLOC_001274 - MDC000953.460:2713-3264 C0 C2 OK 0 422.783 inf nan 5e-05 0.0176643 yes
XLOC_001440 XLOC_001440 - MDC001075.160:2046-6158 C0 C2 OK 0 21.3232 inf nan 5e-05 0.0176643 yes
XLOC_001506 XLOC_001506 - MDC001128.127:17389-19697 C0 C2 OK 0 29.9881 inf nan 5e-05 0.0176643 yes
XLOC_001700 XLOC_001700 - MDC001307.434:8029-10455 C0 C2 OK 0 29.6674 inf nan 5e-05 0.0176643 yes
XLOC_002080 XLOC_002080 - MDC001577.2963:9743-10308 C0 C2 OK 0 55.0199 inf nan 0.0002 0.0495222 yes
XLOC_002159 XLOC_002159 - MDC001635.618:38277-40655 C0 C2 OK 0 548.554 inf nan 5e-05 0.0176643 yes
XLOC_002576 XLOC_002576 - MDC001927.172:1937-4253 C0 C2 OK 0 9.30383 inf nan 0.0001 0.0322248 yes
XLOC_002715 XLOC_002715 - MDC002007.156:3230-4772 C0 C2 OK 0 20.7091 inf nan 0.0001 0.0322248 yes
XLOC_002856 XLOC_002856 - MDC002113.246:200-1028 C0 C2 OK 0 39.9545 inf nan 5e-05 0.0176643 yes
XLOC_002872 XLOC_002872 - MDC002121.323:4512-5988 C0 C2 OK 0 22.4906 inf nan 0.0001 0.0322248 yes
XLOC_003047 XLOC_003047 - MDC002235.543:23412-24616 C0 C2 OK 0 23.2862 inf nan 5e-05 0.0176643 yes
XLOC_003153 XLOC_003153 - MDC002325.383:3175-4516 C0 C2 OK 0 22.8465 inf nan 5e-05 0.0176643 yes
XLOC_003453 XLOC_003453 - MDC002536.231:5046-9852 C0 C2 OK 0 195.692 inf nan 5e-05 0.0176643 yes
XLOC_004230 XLOC_004230 - MDC003196.304:8022-10369 C0 C2 OK 0 41.8157 inf nan 5e-05 0.0176643 yes
Could you (or one of the editors) fix this so it has the right format?
BTW, there will be plenty of genes and transcripts with no expression. Do expect those to be otherwise? In general, if you have 0 expression in one group of samples and any expression at all in another, the foldchange will be infinite and you'll get results like these.
Hi dpryan Thanks for the reply. The output is from cuffdiff, the gene_exp.diff file. I put the same question on seqanswers http://seqanswers.com/forums/showthread.php?t=33733&highlight=Lizex . Cufflinks editors did respond at all to the question I send them. From the entire experiment i.e. five time points (control and treatment) I got 3645 significant transcripts (yes in gene_exp.diff file). From these only 34 i.e. 0.9% have an expression value. I agree with you that there will be genes and transcripts with no expression but to see so 0.9% of transcripts with an expression value from an entire experiment, is that normal. How do I address this situation where I have 0 expression in one group and expression in another group?
That does sound a bit off. I actually wonder if some of your libraries are just crap (it happens). It'd be helpful to know the actual nature of the experiment, as 3645 DE genes could either be a reasonable number or way too many. Also, you might open your BAM files in IGV or another browser and just see if these calls seem reasonable given the data you have.
Thanks I'll look into these.
If you are looking for why you are getting those values I can't help but I can explain you what does "nan" and "inf" mean? inf - infinite and you will only see it in the column representing fold change. If the denominator sample has zero expression for the gene then the fold change will be inf. nan- "not a number" tag will appear in test statistics column because the test statistics was either infinity or -infinity or something that was not a number. i dont know how they calculate the test statistics and what are the possible values for it.
EDITED -to make it more clear
Thanks for the reply.
There were a few typos so i edited it. Nth substantial.