3.5 years ago
lauraeiv0


I am a beginner in bioinformatics and I want to blast a nucleotide sequence against a nucleotide database but the nucleotide collection(nt) database excludes WGS which i would like to include.

Is there a way I can join these two databases to give me the output i want while remaining non redundant?

If this is even possible, can you do this using web blast rather than the command line?

3.5 years ago
pld4.8k

The WGS database contains 'in progress' genomes, since people might be working on the same organisms, similar strains, and metagenomic samples, there's plenty of opportunity for the same thing to show up.

You could possibly using the nt database, but whatever redundant sequences not presently in nt wouldn't be removed from WGS.

Either way, I don't think WWWBLAST will work.

Thanks for that. I am doing both a nucleotide and protein blast so I am hoping to get the same genomes in the nucleotide as I am with the nr/nt protein database.

Is it possible to do with command line? I am very stuck for time so I dont think this is an option for me anyway

3.5 years ago
Brian Bushnell16k

nt is already internally redundant.  You can remove redundancy from one or more fasta files using Dedupe, part of the BBMap package, like this: in=file1.fa,file2.fa out=deduped.fa

