Question

Using salmon in Galaxy

0

Entering edit mode

3.6 years ago

anonymous • 0

Hi everyone. I am executing Salmon in Galaxy in order to carry out gene quantification from mouse RNA-Seq data (6 samples). To do so, I am providing a reference genome (cDNA, in fasta format), the processed reads (in fastqsanger.gz format) of one of these samples (after executing Trim-Galore) and a .gtf file. When executing Salmon, I obtain in Galaxy two output files: quant.sf and quant.genes.sf, which refer to the quantification and the gene quantification, respectively. While quant.sf contains, in its first column, transcript IDs, the quant.genes.sf file also contains transcript IDs in its first column, when it should have gene IDs. I am not able to find the reason why this happens.

This is part of the content of the reference genome file I am providing Salmon in Galaxy: enter image description here

This is part of the content of the processed reads of one the samples of this RNA-Seq experiment: enter image description here

This is part of the content of the .gtf file: enter image description here

And this is what I am indicating Salmon to do in Galaxy: enter image description here

After being executed, Salmon provides these output files:

quant.sf (Quantification)

enter image description here

quant.genes.sf (Gene quantification)

enter image description here

As you can see, the first column of quant.genes.sf still has transcript IDs (although the content changes in comparison to quant.sf), when it should have gene IDs. Could you help me find out what's the reason behind this?

Thank you so much.

Regards.

salmon galaxy rna-seq • 1.5k views

ADD COMMENT • link updated 3.6 years ago by GenoMax 154k • written 3.6 years ago by anonymous • 0

0

Entering edit mode

People generally do this after the fact using BioMart: ENSMUSG number convert to gene name but since you are using Galaxy that may be a limitation. You could do this outside of Galaxy.

ADD REPLY • link 3.6 years ago by GenoMax 154k