Confused about which database to use - nt or nr?
2
0
Entering edit mode
9 weeks ago
DNAngel ▴ 240

Hi all,

I work with metagenomic and metatranscriptomic data. I know the nt/nr database is for nucleotide reads which my data is all in nucleotide form (not in protein form so I wouldn't be using blastp). But when I do a local blast, I am just wondering how to know if I should use nt or nr. Because with blastp, you only use nr for protein nucleotide sequences. If I have metatranscrptomic reads, I did use blastn against the nt database and I got a lot of hits so I felt that was okay.

When I use FragGeneScan to extract protein reads from the RNA sequences I did use nr database and just wondering if this makes sense or should I have used the nr database from the start?

I've read a lot on this on NCBI but it is very clear to me in practise.

blast • 342 views
0
Entering edit mode
9 weeks ago
Mensur Dlakic ★ 21k

Because with blastp, you only use nr for protein nucleotide sequences.

There is no such thing as protein nucleotide sequences. The sequences are either nucleotide (nt) or protein (nr). blastn is for comparing nucleotide queries to nt database, and blastp for comparing proteins to nr database.

0
Entering edit mode

While unrelated to this question peptide nucleic acids exist: https://en.wikipedia.org/wiki/Peptide_nucleic_acid

0
Entering edit mode

...or blastx for translated dna against the nr

0
Entering edit mode

... or tblastn for protein against translated DNA in nt. I don't think the original question was intended for all BLAST flavors and all search permutations.

0
Entering edit mode
9 weeks ago

"translated DNA:protein (BLASTX) searches are far far more sensitive than DNA:DNA searches" - Bill Pearson

if you're happy with your blastn results that's good but blastx to the nr would be advisable

0
Entering edit mode

If you have transcripts (3 potential reading frames) it's rather wasteful to do blastx with its 6 reading frames. It's far more computationally efficient to predict proteins and then do blastp. No loss in sensitivity. It doesn't really matter if you're querying just a few hundred sequences or whatever but with large sets there will be a huge cost..