Customise blastn to exclude key words
0
0
Entering edit mode
16 months ago
ando.kelli ▴ 40

Hi All,

I'm annotating a transcriptome against NCBI's nt database, and was wondering if I could get some advice regarding customisation.

A lot of the hits are to genomes: e.g. 'Salmo trutta genome assembly, chromosome: 13' which doesn't tell me anything about what the transcript might be. (I'm doing several types of annotation, and I use this one to fill some of the gaps that are left after using other methods).

An example of my code is:

ls trinity_out_dir.Trinity.*.fasta | parallel --eta -j 14 --load 80% --noswap 'blastn -db /volume/BlastDBs/nt -query {} -out blastn_outfiles/{.}.tabular -evalue 1e-5 -outfmt "6 std stitle staxids sscinames sskingdom" -max_target_seqs 1 -max_hsps 1 -num_threads 2'


Any ideas on how I can get blastn to ignore key words? Like 'genome' and 'predicted'?

Cheers, Kelli

blast linux blastn ncbi • 279 views
1
Entering edit mode

You may need to post-filter your results to ignore things you are not interested in.

0
Entering edit mode

I don't think one can get blast to ignore keywords - it is a sequence search tool rather than keyword parser. Separately, I don't think it is a good idea to do what you want even if it was possible, as I know from experience that even genuine hits sometimes have words like genome or predicted in their descriptions. As @genomax suggested, you can filter out the unwanted hits after the search is complete.

0
Entering edit mode

Thanks for your input genomax and Mansur Dlakic.

Filtering the offending hits out of the database isn't what I want to do, because that's the same as deleting them from my dataset. I'd rather annotate them if possible because many of them are differentially expressed.

0
Entering edit mode

We are suggesting that you filter hits your can't use from your blast results, not the sequence from your database.

0
Entering edit mode

Thanks Genomax. If I can't annotate them I can't use them, so for me filtering them is the equivalent of deleting them.

0
Entering edit mode

I hope you realize that every transcript put together by Trinity is not real. 100% of transcripts are never found in one experiment.

0
Entering edit mode

Yep for sure Genomax. I'm talking about transcripts that have been annotated using Blastn with quite stringent parameters, but the annotation not informative. I want to improve the existing annotation if possible.

I'm happy to filter out transcripts that don't have high quality hits.

Traffic: 2455 users visited in the last hour
FAQ
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.