How to convert a database from protein to nucleotide
0
0
Entering edit mode
7.1 years ago

Hi! I'm fairly new at UNIX and Bioinformatical work. I am taking a class now that is focusing on using UNIX and BLAST on nucleotide sequences. When the databases were made, we have been using blastn, constantly. I wanted to apply some of what I learned to my own research and realized that my databases are now protein databases, even though the fasta file is ALL nucleotides. Is there a way to convert the database from protein to nucleotide?

blastn unix blastx blast • 2.9k views
ADD COMMENT
1
Entering edit mode

Due to redundancy in codon usage for multiple amino acids a protein sequence does not uniquely identify a nucleotide sequence.

ADD REPLY
0
Entering edit mode

If you only have the index files (i.e. no fasta protein sequence) then you would need to use blastdbcmd utility to first recover the fasta sequence.

Then you can use back-translation tools like backtranseq from EMBOSS (http://www.ebi.ac.uk/Tools/st/). Ideally if you know what genome those proteins are from then you could go and get the DNA sequence from source instead.

ADD REPLY
0
Entering edit mode

The genome is from Listeria monocytogenes, and I have the fasta file for a contig of this genome, but for some reason it recognizes it as a protein and not nucleotide file. Even though the first couple lines are: AGATTCCTTGCGTCAAATTGACTTCGCTAGCAATTAAATTACTAGTTTGTTTTGTTGAAAACAGCTTTCT GTTTTCTGCCCTGCGATTACCAGTGAGACTTTACGTCTCATTGCTTTTCGTCTTCTTCTTTGTTCAGTTT TCAAAGGTCAGTTGCTTTGTTAACGCAACTTTTAAATCTTACCATAAAGTTGAAATCACGTCAACAACTA

ADD REPLY
1
Entering edit mode

That is definitely not protein and if you started at the top of the file it is not in fasta format either.

but for some reason it recognizes it as a protein and not nucleotide file.

What is doing that? Blast? Have you created indexes for this dataset already?

ADD REPLY
0
Entering edit mode

So when I made the databases by makeblastdb -dbtype prot -in LM_R8_5081_contig10.fasta -out LM_R8_5081_contig10 -parse_seqids Then the report after says that it is a protein: New DB title: LM_R8_5081_contig10.fasta Sequence type: Protein Deleted existing Protein BLAST database named /home/ajt3/Listeria_Work/LM_R8_5081_contig10_test Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B

ADD REPLY
0
Entering edit mode

Nope, I'm dumb, I realized what it is...

ADD REPLY
0
Entering edit mode

Glad you figured the problem out yourself :)

ADD REPLY
0
Entering edit mode

realized that my databases are now protein databases, even though the fasta file is ALL nucleotides.

It sounds like the database you have, is a nucleotide fasta, but when you made the db, it was made as a protein database, so the extensions are incorrect for the indexed files.

Do your db files end with the following?

.phr, .pin, .psq

remake your database with this command:

makeblastdb -in yourfastfile.fasta -dbtype nucl

and your db will be made with the correct extensions for blastn. Your files will not end with

.nhr, .nin, .nsq

ADD REPLY

Login before adding your answer.

Traffic: 1903 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6