Confused about which database to use - nt or nr?
2
0
Entering edit mode
18 months ago
DNAngel ▴ 250

Hi all,

I work with metagenomic and metatranscriptomic data. I know the nt/nr database is for nucleotide reads which my data is all in nucleotide form (not in protein form so I wouldn't be using blastp). But when I do a local blast, I am just wondering how to know if I should use nt or nr. Because with blastp, you only use nr for protein nucleotide sequences. If I have metatranscrptomic reads, I did use blastn against the nt database and I got a lot of hits so I felt that was okay.

When I use FragGeneScan to extract protein reads from the RNA sequences I did use nr database and just wondering if this makes sense or should I have used the nr database from the start?

I've read a lot on this on NCBI but it is very clear to me in practise.

blast • 1.8k views
ADD COMMENT
0
Entering edit mode
18 months ago
Mensur Dlakic ★ 27k

Because with blastp, you only use nr for protein nucleotide sequences.

There is no such thing as protein nucleotide sequences. The sequences are either nucleotide (nt) or protein (nr). blastn is for comparing nucleotide queries to nt database, and blastp for comparing proteins to nr database.

ADD COMMENT
0
Entering edit mode

While unrelated to this question peptide nucleic acids exist: https://en.wikipedia.org/wiki/Peptide_nucleic_acid

ADD REPLY
0
Entering edit mode

...or blastx for translated dna against the nr

ADD REPLY
0
Entering edit mode

... or tblastn for protein against translated DNA in nt. I don't think the original question was intended for all BLAST flavors and all search permutations.

ADD REPLY
0
Entering edit mode
18 months ago

"translated DNA:protein (BLASTX) searches are far far more sensitive than DNA:DNA searches" - Bill Pearson

if you're happy with your blastn results that's good but blastx to the nr would be advisable

ADD COMMENT
0
Entering edit mode

If you have transcripts (3 potential reading frames) it's rather wasteful to do blastx with its 6 reading frames. It's far more computationally efficient to predict proteins and then do blastp. No loss in sensitivity. It doesn't really matter if you're querying just a few hundred sequences or whatever but with large sets there will be a huge cost..

ADD REPLY

Login before adding your answer.

Traffic: 2573 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6