Hi everyone. I am executing Salmon in Galaxy in order to carry out gene quantification from mouse RNA-Seq data (6 samples). To do so, I am providing a reference genome (cDNA, in fasta format), the processed reads (in fastqsanger.gz format) of one of these samples (after executing Trim-Galore) and a .gtf file. When executing Salmon, I obtain in Galaxy two output files: quant.sf and quant.genes.sf, which refer to the quantification and the gene quantification, respectively. While quant.sf contains, in its first column, transcript IDs, the quant.genes.sf file also contains transcript IDs in its first column, when it should have gene IDs. I am not able to find the reason why this happens.
This is part of the content of the reference genome file I am providing Salmon in Galaxy:
This is part of the content of the processed reads of one the samples of this RNA-Seq experiment:
This is part of the content of the .gtf file:
And this is what I am indicating Salmon to do in Galaxy:
After being executed, Salmon provides these output files:
quant.sf (Quantification)
quant.genes.sf (Gene quantification)
As you can see, the first column of quant.genes.sf still has transcript IDs (although the content changes in comparison to quant.sf), when it should have gene IDs. Could you help me find out what's the reason behind this?
Thank you so much.
Regards.
People generally do this after the fact using BioMart: ENSMUSG number convert to gene name but since you are using Galaxy that may be a limitation. You could do this outside of Galaxy.