Question

Help understanding methods?

0

Entering edit mode

4.0 years ago

paul.donat • 0

I'm trying to do something similar to what the authors did in this paper. They are looking for TAAR genes in various genomes using tblastn and validating the hits using blastp against the NR-protein database at NCBI. I have some general questions about the effective length of database parameter. I understand why they are standardizing it, what I don't understand is how they came up with (1.1X10^10)? Does this have to do with the size of the NR database? or is there some other reason they might have used this specific value?

Any help would be great thanks!

tblastn effective length of database • 489 views

ADD COMMENT • link updated 4.0 years ago by Mensur Dlakic ★ 27k • written 4.0 years ago by paul.donat • 0

score 0 · Answer 1 · 2020-04-20

There is no prescribed number to use here, and you should contact the authors if you want to know exactly why they used this number. The size of non-redundant protein database is ~275 million at the moment, which is still smaller than their number. Maybe they meant to standardize for numbers of letters in non-redundant protein database (~9.8 x 10^10), though that would be specified differently from the way they did. Either way, selecting this number is arbitrary because it serves as an internal reference for each research group to compare various BLAST results over a period of time. It is not meant to be an absolute reference for everyone out there, nor should anyone else stick to that number unless they want to specifically compare their results to that particular study.