Hello everyone,
I'm working on brassica napus and have sent my sample to company for RNA-seq analysis. Unfortunately, when I got my final results back, there was only a little annotation on differentially expressed genes, where most of them were simply labelled as "Bnaxxxx Protein" which would be insufficient for downstream analysis if I would closely look at genes with specific functions or within some pathways. And I figured the company only compared the genes with UniprotKB database, So I'd like to know what kind of data I need for aligning the data with more databases, such as nr database, in order to get more details on gene annotations.
appreciate any help!
Thank you for your reply.
Also I would like to get some further advice from you. For example, I'm about to blast against nr_prot database here, by which software or in which way I am able to do it in a high through-put fashion, since I realize there are more than 100.000 genes which could not be done by putting them one by one into NCBI blastx. Meanwhile, all data I got from the company consists of only two parts: rawdata(reads) and IGV data, could the latter one be directly used here for getting the annotation or should I start from the very beginning with the raw data?
you will need to do this using a local blast install ... get all the brassica protein sequences and blast those against nrprot. The name the company uses in their lists should be the same as the official brassica release (I assume)
did you not get a list of gene IDs with expression values?
yes, I got a list of gene IDs with expression values without detail sequences. Also I checked the so-called 'Standalone Blast', I've downloaded and installed it on my windows lap, still on the way figuring out how it works...
ugh, windows :/
you will also have to download the blast DB, those are very large files, so first check the storage you have available. Also running a blast on your laptop will take some time
no access to an HPC system or such (compute server, ... ) ?
Ya, I've also installed linux in my lap where I originally innocently thought mapping of raw reads could be done which seems impossible right now with okay CPU and limited RAM as well as storage space. Also with you reminding I realized there was a super computer in our lab by which all of these preliminary work could be finished, So I probably go there and try if it works. Thank you so much!!!