Can you combine nt and WGS nucleotide databases for a BLAST search and make it non redundant?
2
0
Entering edit mode
8.6 years ago
lauraeiv • 0

Hi,

I am a beginner in bioinformatics and I want to blast a nucleotide sequence against a nucleotide database but the nucleotide collection(nt) database excludes WGS which I would like to include.

Is there a way I can join these two databases to give me the output I want while remaining non redundant?

If this is even possible, can you do this using web blast rather than the command line?

blast-database blast • 2.9k views
ADD COMMENT
1
Entering edit mode
8.6 years ago
pld 5.1k

The WGS database contains 'in progress' genomes, since people might be working on the same organisms, similar strains, and metagenomic samples, there's plenty of opportunity for the same thing to show up.

You could possibly using the nt database, but whatever redundant sequences not presently in nt wouldn't be removed from WGS.

Either way, I don't think WWWBLAST will work.

ADD COMMENT
0
Entering edit mode

Thanks for that. I am doing both a nucleotide and protein blast so I am hoping to get the same genomes in the nucleotide as I am with the nr/nt protein database.

Is it possible to do with command line? I am very stuck for time so I dont think this is an option for me anyway

ADD REPLY
1
Entering edit mode
8.6 years ago

nt is already internally redundant. You can remove redundancy from one or more fasta files using Dedupe, part of the BBMap package, like this:

dedupe.sh in=file1.fa,file2.fa out=deduped.fa
ADD COMMENT

Login before adding your answer.

Traffic: 2279 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6