Hello,
I am using the mm9 UCSC gene annotation with Homer to quantify repeats using the analyzeRepeats command. I am using the options -count cds and -tpm and -d options for this. All of those commands work just fine and I can get an output, but I want to condense my genes to make it so it reports all isoforms in 1 annotation line for each gene. This program provides a command line function -condenseGenes which should condense them to the Gene_id. My GTF file doesnt appear to have a gene_id line and Ive had to add it in after.
When I run the -condenseGenes option, the error output I get informs me that the fasta mm9.fa file that I am using can no longer be found whereas it has no trouble finding the mm9.fa file when I run it without -condenseGenes.
Anyone have a suggestion how I can condesne these genes?
Can you assist us by pasting some of the commands that you are using? Also, some entries from your GTF?
Note that the GTFs from GENCODE have gene_id: http://www.gencodegenes.org/mouse_releases/
Thank you for your response!
I am using the line of code
So this set of commands works perfectly fine, but when I add in the following:
I get an error that says "cant find mm9 genome, assuming mm9 is the name of the organism" then that causes it to fail down the line.
That's strange - looks like a bug in the coding of the program.
Did you try to move the
-condenseGenes
parameter to different positions in the command line?I tried different positions in the command line, and that had no effect. However, I am running this on a computer cluster and maybe that is causing some problems. My university has a few different clusters and when I ran it on another cluster it gave me a different error, it was unable to recognize the command at all and said unable to find command -c, as if it cut off the command. I think I will just have to find a way around this and use the raw reads to normalize and condense the genes.
Okay, are you pasting the commands from a Windows / MAC text file into a terminal window accessing the cluster? Formatting issues are common, like hidden end-lines, tabs, etc. Also, sometimes a hyphen is not quite a hyphen...
You should literally just check to see if the reference genome FASTA exists where you are running the command. You may not have root access, but you should be able to read file listings.
Im just going to get the reads in raw read count so I can add the genes up myself. This seems to be the best option at this point. I can calculate TPM from there.
Im starting to think that may be the problem.
You should report this as a potential bug with the developers. Just be sure that
-condenseGenes
is indeed compatible with the command that you're running.I figured it out, I had to download some dependencies that I did not have, basically I had to walk through the configuration file and there were optional download files that did not come in the base download.
Glad that you got it sorted out.