Is there a good service for annotating transcripts?
3
0
Entering edit mode
9.9 years ago
will • 0

After getting say 100k transcripts from an rna-seq project, generally one wants to annotate them against a database like nr, using say blastx. Problem is, this is very slow, taking e.g. a week with 24 CPUs. What have people done to overcome this?

blast rna-seq • 1.9k views
ADD COMMENT
1
Entering edit mode
9.9 years ago

I use GNU Parallel to run several BLAST jobs at once with each job getting one CPU, have a look here: Gnu Parallel - Parallelize Serial Command Line Programs Without Changing Them

That way it should only take a day or so.

ADD COMMENT
1
Entering edit mode
9.9 years ago

Check out those guidelines: http://trinotate.sourceforge.net/ . In some cases blastp + domain prediction are quite enough for annotation.. By the way, have you considered using Cloud services?

ADD COMMENT
0
Entering edit mode
9.9 years ago
Prakki Rama ★ 2.7k

Another possibility can be reducing the database size you search in. Instead of taking complete NR database, you can take species which are very near in the tree, and also some what distant species sequences which are comprehensively studied and have substantial information such as human, mouse etc.

This way it reduces the search space in magnitudes, and most of your sequences should get annotated. But, there are also chances you might not be able to annotate a small fraction of your 100K transcripts.

~Rama.

ADD COMMENT
0
Entering edit mode

Expanding on this answer: Have a look at SwissProt, it's manually curated so you get less noisy results, but it's also much smaller than nr, so you'll get less results in much faster time.

ADD REPLY

Login before adding your answer.

Traffic: 3832 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6