7 weeks ago
Hello all,

I am trying to run Homer's findGO.pl, and am confused as to what files I am needing for this program to run. I am working with a non-model species for what it's worth, and I think that my issues are quite basic at this point.

This is the brief description of the command:

findGO.pl <input file of Entrez Gene IDs>  <organism> <output directory> [-bg <background ID file>] [-cpu #] [-human]

I am embarassingly confused at step 1! What exactly is this <input file of Entrez Gene IDs>? Is it the file I receive from, say, annotatePeaks.pl? And for <organism>, am I required to use loadGenome.pl in Homer first? I thought I had done this already, using this code:

loadGenome.pl -name perch -org null -fasta /ATACseq/FASTQ_ATAC_43955/pfluv-genome/pfluv-genome.fa -gtf /ATACseq/FASTQ_ATAC_43955/pfluv-genome/pfluv-genome.gtf

Which to my understanding should create a genome called "perch". Yet when I use "perch" for the <organism>, it does not work.

As I understand it, -bg should be used to reference the full list of genes... but I could easily be wrong.

I also have a list of all my genes, as well as a list of the human orthologs. Here is a quick example of my file:

PFLUV_G0027820  MT-ND1
PFLUV_G0027830  MT-ND2
PFLUV_G0027840  MT-CO1
PFLUV_G0027850  MT-CO2
PFLUV_G0027860  MT-ATP8
PFLUV_G0027870  MT-ATP6
PFLUV_G0027880  MT-CO3
PFLUV_G0027890  MT-ND3
PFLUV_G0027900  MT-ND4L
PFLUV_G0027910  MT-ND4

Is this type of file something I can use in this command?

Apologies for all the stupid questions, but I am very stuck and I don't see anything on the Homer website that is sufficiently answering any of my problems, so I would appreciate any insights you all might be able to provide.


7 weeks ago

My guess, relying purely on empirical experiences when filling in missing information the gene ids will be a file with numerical values that correspond of Entrez genes with one value on each line

the organism will be a shorthand notation like hg19 or hg38

but note that you would need to use their config tool to set it all up, and that can be quite the convoluted process

Thanks for the input. In your opinion, do you think it would be easier to just compile a list of, say, the top 100 or so genes that pop up from Homer's annotatePeaks.pl, convert those genes to their human ortholog, and then use something like Metascape to get an analysis of that gene list? Homer's findGO.pl seems to be a really convenient tool in theory, but from where I am now it seems to not translate that well into practice.


