Proteins db vs. nucleotides db
1
0
Entering edit mode
2.2 years ago
valentinavan ▴ 50

Hi,

Very basic question, forgive me if it might seem too naive.

In whole genome sequencing and metagenomics does it make a difference (for example in terms of accuracy of the results) doing a classification against a proteins database, like nr, compared to a nucleotides database, like nt?

Thanks

classification database metagenomics • 746 views
ADD COMMENT
0
Entering edit mode
2.2 years ago
Mensur Dlakic ★ 27k

Difficult to answer without more details. If you are lucky enough that your sequences are already similar to genomes that are already out there in databases, you may not need either nt or nr. Instead, you may want to try sendsketch.sh from the BBTools package. It will remotely search your sequence in 5-10 seconds against either nucleotide or protein databases, and give you a very quick answer.

Here is an example for one of my metagenomic bins versus the nucleotide database:

Query: group_045.fa     DB: nt  SketchLen: 9793 Seqs: 530       Bases: 3608495  gSize: 3143572  GC: 0.305       File: group_045.fa
WKID    KID     ANI     SSU     Complt  Contam  Matches Unique  TaxID   gSize   gSeqs   taxName
56.25%  28.50%  97.93%  84.42%  100.00% 0.46%   2791    2757    795359  1612118 4       Thermodesulfobacterium geofontis OPF15
0.22%   0.12%   80.08%  83.35%  100.00% 28.84%  12      5       2234087 1746157 2       Thermodesulfobacterium sp. TA1
0.15%   0.08%   78.74%  83.53%  100.00% 28.88%  8       0       289377  1703141 5       Thermodesulfobacterium commune DSM 2178

And versus the protein database:

Query: group_045.fa     DB: ProkProt    SketchLen: 17939        Seqs: 3586      SeqLen: 1090784 gSize: 705574   File: group_045.fa
WKID    KID     AAI     SSU     Complt  Contam  Matches Unique  TaxID   gSize   gSeqs   taxName
76.12%  50.00%  97.39%  84.35%  100.00% 7.07%   8970    6334    795359  459769  1511    Thermodesulfobacterium geofontis OPF15
10.26%  7.01%   80.26%  83.87%  73.09%  50.06%  1257    74      161156  482978  1548    Thermodesulfobacterium hydrogeniphilum
7.64%   5.56%   77.99%  83.28%  66.59%  51.51%  998     45      2234087 510876  1642    Thermodesulfobacterium sp. TA1
7.83%   5.44%   78.19%  83.48%  69.63%  51.63%  976     14      289377  486048  3236    Thermodesulfobacterium commune DSM 2178
7.43%   5.15%   77.76%  82.75%  69.39%  51.92%  924     6       1123372 484331  1613    Thermodesulfobacterium hveragerdense DSM 12571
7.25%   5.20%   77.58%  82.92%  67.12%  51.87%  933     4       1123373 500560  1645    Thermodesulfobacterium thermophilum DSM 1276
4.87%   3.50%   74.65%  82.87%  64.72%  53.58%  627     20      1653476 503318  1637    Caldimicrobium thiodismutans
1.74%   1.51%   67.60%  82.88%  51.44%  55.57%  270     6       999894  606333  2026    Thermosulfurimonas dismutans
1.36%   1.22%   66.02%  81.94%  49.56%  55.86%  218     0       667014  627712  2121    Thermodesulfatator indicus DSM 15286
1.36%   1.20%   66.00%  81.76%  49.74%  55.87%  216     0       1795632 627034  2097    Thermodesulfatator autotrophicus
1.28%   1.19%   65.61%  82.06%  47.49%  55.88%  214     0       1123371 656772  2128    Thermodesulfatator atlanticus DSM 21156
0.79%   0.60%   62.61%  81.40%  57.30%  56.47%  107     17      1871110 535441  1838    Thermodesulfovibrio sp. N1
0.80%   0.64%   62.67%  81.74%  55.00%  56.44%  114     4       86166   558478  1839    Thermodesulfovibrio aggregans
0.67%   0.55%   61.54%  81.32%  52.43%  56.52%  99      2       1123375 584489  1928    Thermodesulfovibrio islandicus DSM 12570
0.67%   0.54%   61.62%  81.25%  54.74%  56.54%  96      0       2580394 557606  1808    Thermodesulfovibrio sp. Kuro-1
0.67%   0.54%   61.59%  81.37%  54.38%  56.54%  96      0       289376  561402  1876    Thermodesulfovibrio yellowstonii DSM 11347
0.64%   0.48%   61.33%  80.65%  56.95%  56.59%  87      1       1123376 535566  1738    Thermodesulfovibrio thiophilus DSM 17215
0.42%   0.40%   58.86%  79.21%  44.94%  56.67%  72      0       1156395 680300  2090    Dissulfuribacter thermophilus
0.31%   0.27%   57.19%  81.34%  38.17%  56.74%  51      0       39841   800456  2579    Thermodesulforhabdus norvegica
0.28%   0.26%   56.50%  79.35%  45.96%  56.81%  47      1       1621989 663655  2241    Candidatus Desulfofervidus auxilii
ADD COMMENT
0
Entering edit mode

Thanks for your reply.

I have found the answer to my question here, where it is very nicely explained https://www.nature.com/articles/ncomms11257

ADD REPLY

Login before adding your answer.

Traffic: 1471 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6