Hello, I'm hoping someone can provide some insight or point me in the right direction... I have very little programming knowledge and am fairly new to RNA-seq but I'm sure there must be an easier way to do what I need...
Using Trinity de novo assembly, I have assembled my paired end reads for my RNA-seq data. I have also used the trinity RSEM utility to calculate transcript abundance. I now would like to annotate or identify, by protein name, those transcripts most highly expressed in certain samples.
Currently, what I am doing is importing the output RSEM file (RSEM.genes.results), with FPKM values, into an excel / tab-delineated file, then sorting by highest FPKM. Then, I search for the gene id corresponding to the FPKM value in the output trinity assembly (.fasta). There, I can find the corresponding sequence, and then I manually input that into the nucleotide blast database on pubmed...for each individual gene.
This is a very cumbersome and tedious approach and I am certain there is more automated way to do this. I have very limited programming experience so I cannot quickly write a script to do the above for me...but I'm almost positive there must be some built in trinity function or other already established script that can do this. What is the approach that is generally taken? I would be extremely grateful if you could point me in the right direction! Thank you for any help!