I have a tabular m.8 file after running a Diamond annotation on a "reference" transcriptome for a non-model species (assembled with Trinity).
How to I get gene-names from the genebank ID's?
Example from the datasheet:* gi|736186330|ref|XP_010770183.1|*
The goal is to lift an analysis of differential gene expression from transcript level to gene level - and to do that, I really would like use the gene names over the ID's given in the example.
The file consists of following columns (maybe that will help in imagining how the data sheet looks like):
# qseqid means Query Seq-id
# sseqid means Subject Seq-id
# pident means Percentage of identical matches
# length means Alignment length
# mismatch means Number of mismatches
# gapopen means Number of gap openings
# qstart means Start of alignment in query
# qend means End of alignment in query
# sstart means Start of alignment in subject
# send means End of alignment in subject
# evalue means Expect value
# bitscore means Bit score