Hi everyone! please help me, I am new in this field, I have question in mapping and counting! I was mapping an mouse RNA-seq data to genome (mm10) using STAR, to save time, I did not build STAR index by myself but use the mm10 genome index built by my colleague, and then I use the genome gtf file downloaded from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/genes/mm10.knownGene.gtf, to do counting, the gtf file seems to be in ensembl ID. However, in the counting result, the gene ID seems to be uniprot ID, not ensembl ID nor gene symbol, I guess if I have done any procedure wrong? If not, how can I get gene symbol from the ID?
Here is some information:
head mm10.knownGene.gtf
chr1 knownGene transcript 3073253 3074322 . + . gene_id "ENSMUST00000193812.1"; transcript_id "ENSMUST00000193812.1";
chr1 knownGene exon 3073253 3074322 . + . gene_id "ENSMUST00000193812.1"; transcript_id "ENSMUST00000193812.1"; exon_number "1"; exon_id "ENSMUST00000193812.1.1";
chr1 knownGene transcript 3102016 3102125 . + . gene_id "ENSMUST00000082908.1"; transcript_id "ENSMUST00000082908.1";
head genes
A0A023T778
A0A075B5I2
A0A075B5J3
A0A075B5J4
A0A075B5K6
A0A075B5L1
A0A075B5L2
A0A075B5L3
A0A075B5L7
A0A075B5L8
A0A075B5M4
A0A075B5P0
A0A075B5P1
A0A075B5P4
A0A075B5P6
A0A075B5P8
A0A075B5P9
A0A075B5Q0
Please help me!
Thank you rpolicastro!! I would like to perform GO enrichment analysis in following analysis, can I do that after using Salmon for counting?
A pipeline we use often is: Salmon -> tximeta -> DESeq2 / edgeR / limma -> goseq.
Tximeta will also automatically handle gene IDs for you (gene-level summarization and ID mapping).