I just performed an RNAseq experiment, for which I used HISAT2 for the alingment, Stringtie for the assembly and the R package Ballgown for the Differential Expression (DE) analysis (protocol published here: http://www.nature.com/nprot/journal/v11/n9/full/nprot.2016.095.html ).
The default output of the DE analysis is a table which looks pretty standard, with a transcript ID, a fold change value, a p-value and a q-value. The transcript ID corresponds to either an Ensmbl ID or a MSTRG ID, which is basically a number that is generated during the assembly of the transcriptome. However, the gene name/symbol corresponding to each feature does not appear in the table.
The protocol I followed provides a command to add the gene names to the results table:
results_transcripts = data.frame(geneNames=ballgown::geneNames(bg_chrX_filt), geneIDs=ballgown::geneIDs(bg_chrX_filt), results_transcripts)
And this actually works when the DE analysis is performed at the transcript level, but not when it is performed at the gene level. The reason is basically that the table resulting from a gene level analysis has many less rows than the original reference which links every transcript ID to a gene name, and R does not like that.
I was wondering if anyone had the same issue before, and if there's some kind of straightforward solution for it. It could virtually be done manually, but it would be a nightmare to add ~20k gene names that way! :S
Thanks in advance for your help!