Hello, I'm sure this isn't possible, but I want to clear my doubts. Is there a way to insert some sequences generated by my lab in nt database downloaded, without having to submit to ncbi? Thanks and sorry for this question.
Hello, I'm sure this isn't possible, but I want to clear my doubts. Is there a way to insert some sequences generated by my lab in nt database downloaded, without having to submit to ncbi? Thanks and sorry for this question.
Given you have nt downloaded and in the BLASTDB path, and your additional sequences are in mysequences.fna in your working directory, the following should work:
makeblastdb -in mysequences.fna -dbtype nucl -title "some sequences I found" -out mysequences -parse_seqids
blastdb_aliastool -dblist nt mysequences -dbtype nucl -title "nt database + my own sequences" -out ntandmore
After that you can run for example:
blastn -db ntandmore ...
Assuming you are talking about BLAST nt
db:
From top of my head I can think of following:
Edit: These are separate options to do what you've asked. Also it is not an complete foolproof tutorial, you need to check the manual for the tools.
1) you can unpack the nt
to fasta (blastdbcmd
) append the sequence, create new db (makeblastdb
).
2) create db from new sequences and combine the database using blastdb_aliastool
. (https://www.ncbi.nlm.nih.gov/books/NBK279693/)
Yes but I continue to have my doubts. Cause when I do the commands like @Michael I get 5 files. So my question is after that, how can I run a blast command with my sequences and the nt databases? If I name it like ntandmore, is this file the compilation of the nt database and my own sequences?
If I name it like ntandmore, is this file the compilation of the nt database and my own sequences?
Correct. @Michael did show you how to search with the new combined database alias with an example in the answer below.
Note: Make sure you did not mess up your original nt.nal
file that described all parts of nt
database, since you used the name -out nt
for the combined database based on a post above.
Hi, it would be good if you keep the comment on the thread where it is related to. It certainly matters how you generate your own database. If it is generated with correct taxids, as you were used to, also the combined alias db will have correct taxids. I don't quite understand your problem, are you saying that you are trying to add sequences for species that do not have an entry in the NCBI taxonomy? But this seems to be a question that goes beyond what was originally asked.
I now need the sequence ID lof the new insertions
You did not actually insert the sequence into nt
database. You searched against an alias that included both nt
+ your data
. So the results you get by searching against this combined alias should have your ID's (they were different than what exists in nt
correct?).
If you are not seeing them then it is possible that limit on how many alignments are reported in your results may be excluding hits from your data.
I don't know if the order of databases specified in the alias makes a difference (it may) so instead of
blastdb_aliastool -dblist nt mysequences -dbtype nucl -title "nt database + my own sequences" -out ntandmore
you may want to create an alias where your sequences are listed first in the alias.
blastdb_aliastool -dblist mysequences nt -dbtype nucl -title "nt database + my own sequences" -out ntandmore
See if that helps bring results from your ID's up first.
So I don't think I did this right... 1) I did the blastdbcmd and then got 6 files with an extension similar to nt. 2) I did the blastdb_aliastool which generated a .nal file. So I copied this files (step 1 and 2) to my nt database directory. Previously I called nt to my database, by defining a variable in the windows system, should I do the same with this new update?
Looks like you did not do this right.
blastdbcmd
is used to pull sequences out of a database. It they are already in nt
then there is no point. If you are getting them from some other blast database then fine. Otherwise please follow two commands detailed by @Michael after you prepare/obtain a file containing your sequences of interest in multi-fasta format. Make sure they don't have identifiers that are already present in nt
.
Gives you NA
for taxonomy?
Make sure you create a custom taxonomy file as described on this page and then use -taxid_map
option with that file name.
Ok the I will continue... So I continue to havet NA has a result to taxonomic file... Now I have changed my fasta header files like this: ref|NC_2345|Pleoticus robustu cytochrome c oxidase subunit I (COI) gene, parcial cds; mitochondrial The blast results gives NC_2345 has the ID however in the tax name it continues to return NA...
I posted a detailed example in other thread: A: Create a costum taxonomy file
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
So I did this command and I got 6 files, so now I shoul add them to the directory where I have the net database right?
I am not sure how many files you get,
ls mysequences.* ntandmore.*
should list all files that make the blast database. Those need to be in the BLASTDB directory, together with the nt database, except mysequeces.fast, the input fasta file. Make sure to also download and extract the nt database again, because it looks as if your previous actions might have overwritten its files. The best way of doing this is to simply change directory to $BLASTDB and generate everything there.Otherwise, doing a ´cp mysequences.* ntandmore.* $BLASTDB´ should also do it.