Customise blastn to exclude key words
0
0
Entering edit mode
4.2 years ago
ando.kelli ▴ 60

Hi All,

I'm annotating a transcriptome against NCBI's nt database, and was wondering if I could get some advice regarding customisation.

A lot of the hits are to genomes: e.g. 'Salmo trutta genome assembly, chromosome: 13' which doesn't tell me anything about what the transcript might be. (I'm doing several types of annotation, and I use this one to fill some of the gaps that are left after using other methods).

An example of my code is:

ls trinity_out_dir.Trinity.*.fasta | parallel --eta -j 14 --load 80% --noswap 'blastn -db /volume/BlastDBs/nt -query {} -out blastn_outfiles/{.}.tabular -evalue 1e-5 -outfmt "6 std stitle staxids sscinames sskingdom" -max_target_seqs 1 -max_hsps 1 -num_threads 2'

Any ideas on how I can get blastn to ignore key words? Like 'genome' and 'predicted'?

Cheers, Kelli

blast linux blastn ncbi • 1.0k views
ADD COMMENT
1
Entering edit mode

You may need to post-filter your results to ignore things you are not interested in.

ADD REPLY
0
Entering edit mode

I don't think one can get blast to ignore keywords - it is a sequence search tool rather than keyword parser. Separately, I don't think it is a good idea to do what you want even if it was possible, as I know from experience that even genuine hits sometimes have words like genome or predicted in their descriptions. As @genomax suggested, you can filter out the unwanted hits after the search is complete.

ADD REPLY
0
Entering edit mode

Thanks for your input genomax and Mansur Dlakic.

Filtering the offending hits out of the database isn't what I want to do, because that's the same as deleting them from my dataset. I'd rather annotate them if possible because many of them are differentially expressed.

ADD REPLY
0
Entering edit mode

We are suggesting that you filter hits your can't use from your blast results, not the sequence from your database.

ADD REPLY
0
Entering edit mode

Thanks Genomax. If I can't annotate them I can't use them, so for me filtering them is the equivalent of deleting them.

ADD REPLY
0
Entering edit mode

I hope you realize that every transcript put together by Trinity is not real. 100% of transcripts are never found in one experiment.

ADD REPLY
0
Entering edit mode

Yep for sure Genomax. I'm talking about transcripts that have been annotated using Blastn with quite stringent parameters, but the annotation not informative. I want to improve the existing annotation if possible.

I'm happy to filter out transcripts that don't have high quality hits.

ADD REPLY

Login before adding your answer.

Traffic: 2400 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6