Question

what kind of data needed for gene annotation?

0

Entering edit mode

4.1 years ago

citronxu ▴ 20

Hello everyone,

I'm working on brassica napus and have sent my sample to company for RNA-seq analysis. Unfortunately, when I got my final results back, there was only a little annotation on differentially expressed genes, where most of them were simply labelled as "Bnaxxxx Protein" which would be insufficient for downstream analysis if I would closely look at genes with specific functions or within some pathways. And I figured the company only compared the genes with UniprotKB database, So I'd like to know what kind of data I need for aligning the data with more databases, such as nr database, in order to get more details on gene annotations.

appreciate any help!

RNA-Seq • 1.0k views

ADD COMMENT • link updated 4.1 years ago by lieven.sterck 15k • written 4.1 years ago by citronxu ▴ 20

score 1 · Accepted Answer · 2020-04-12

1

Entering edit mode

4.1 years ago

lieven.sterck 15k

There are several approaches and/or resources you can use, key word here is "functional annotation".

You could for instance run Blast2GO to get some idea on the GO assignments of genes, you could blast all your proteins against nr_prot indeed (but take into account a rather high level of false positives), or do a KEGG analysis, ...

resource-wise you could have a look at the official Brassica DBs, or perhaps ensembl plants. PLAZA would also be a good starting point , ...

bottom line is that there is no single approach or resource that will tell you everything, you will need to live with a lot of unknowns most likely (but that is not that uncommon for functional annotations). Have a look at several and then combine the results.

ADD COMMENT • link 4.1 years ago by lieven.sterck 15k

0

Entering edit mode

Thank you for your reply.

Also I would like to get some further advice from you. For example, I'm about to blast against nr_prot database here, by which software or in which way I am able to do it in a high through-put fashion, since I realize there are more than 100.000 genes which could not be done by putting them one by one into NCBI blastx. Meanwhile, all data I got from the company consists of only two parts: rawdata(reads) and IGV data, could the latter one be directly used here for getting the annotation or should I start from the very beginning with the raw data?

ADD REPLY • link 4.1 years ago by citronxu ▴ 20

0

Entering edit mode

you will need to do this using a local blast install ... get all the brassica protein sequences and blast those against nrprot. The name the company uses in their lists should be the same as the official brassica release (I assume)

did you not get a list of gene IDs with expression values?

ADD REPLY • link 4.1 years ago by lieven.sterck 15k

0

Entering edit mode

yes, I got a list of gene IDs with expression values without detail sequences. Also I checked the so-called 'Standalone Blast', I've downloaded and installed it on my windows lap, still on the way figuring out how it works...

ADD REPLY • link 4.1 years ago by citronxu ▴ 20

0

Entering edit mode

ugh, windows :/

you will also have to download the blast DB, those are very large files, so first check the storage you have available. Also running a blast on your laptop will take some time

no access to an HPC system or such (compute server, ... ) ?

ADD REPLY • link 4.1 years ago by lieven.sterck 15k

1

Entering edit mode

Ya, I've also installed linux in my lap where I originally innocently thought mapping of raw reads could be done which seems impossible right now with okay CPU and limited RAM as well as storage space. Also with you reminding I realized there was a super computer in our lab by which all of these preliminary work could be finished, So I probably go there and try if it works. Thank you so much!!!

ADD REPLY • link 4.1 years ago by citronxu ▴ 20