Dear bioinformaticians, I am posting this question on behalf of another researcher, who needs help.
I have a set of co-expressed genes from the human genome and would like to find common transcription factor binding site from them (or a subset). My biological story is already written up and, based on that, I like to get certain set of genes to show up in the analysis. Therefore, I am thinking about the following strategy. I will try various online databases or services with all of my co-expressed genes and pick up and cite the one that shows the highest number of my preferred genes. Is that acceptable? How do the reviewers verify the predicted transcription factor binding sites, or do they accept the program and claims at face value? I come from a psychology background and do not know any statistics or bioinformatics. Any help is welcome."
Edit. I am trying to learn how bioinformaticians handle the above kind of 'scams', when they read or review a paper. For example, let me consider the first suggestion of oPOSSUM or MEME. An author tries both programs and sees 'expected' result with MEME. In his paper, he reports that MEME gave him a motif with certain short list of 'expected' genes and ignores the oPOSSUM result. The paper will look more sophisticated in terms of bioinformatic analysis than someone who did not try to look for promoter binding sites. Given that we have so many published software programs for every stage of analysis, an unethical user can bias each step to get to the 'right' biological result and publish in a top journal. How do you guys handle such issues? Based on what I experienced so far, most (bioinformatics) reviewers are happy, if the paper speaks the right statistical/bioinformatic lingo, and leaves the biological or medical part to the 'biologist expert'. With so many tools out there, isn't there room for huge subjective bias in the whole process? What are the rules to evaluate the judgement of the expert biologists? How do we know that an entire subfield is not being biased through the opinions of few experts?
On the other hand, we do not (and possibly cannot) require each author to use every software tool and report all results, and then ask the expert biologist to evaluate all options. That will require the biologist to learn and understand the algorithmic difference between programs, which is nearly impossible. Neither can we require the biologist to check each selected gene in the lab before saying anything about the experiment. Moreover, with the biologist typically being in control of grant and thus the entire process, the bioinformatician has less room to play differently and voice his opinion.
Under those considerations, how do we make sure that an entire subfield is not being created to 'defraud' the larger scientific community?
Among various types of popular programs, (a) TF binding site prediction software, (b) miRNA target prediction software and (c) gene analysis based on positive selection often appear to be biased in my opinion.