Question: How to convert a database from protein to nucleotide
0
gravatar for AJTrunkskun94
9 months ago by
AJTrunkskun940 wrote:

Hi! I'm fairly new at UNIX and Bioinformatical work. I am taking a class now that is focusing on using UNIX and BLAST on nucleotide sequences. When the databases were made, we have been using blastn, constantly. I wanted to apply some of what I learned to my own research and realized that my databases are now protein databases, even though the fasta file is ALL nucleotides. Is there a way to convert the database from protein to nucleotide?

ADD COMMENTlink modified 9 months ago by WouterDeCoster24k • written 9 months ago by AJTrunkskun940
1

Due to redundancy in codon usage for multiple amino acids a protein sequence does not uniquely identify a nucleotide sequence.

ADD REPLYlink written 9 months ago by WouterDeCoster24k

If you only have the index files (i.e. no fasta protein sequence) then you would need to use blastdbcmd utility to first recover the fasta sequence.

Then you can use back-translation tools like backtranseq from EMBOSS (http://www.ebi.ac.uk/Tools/st/). Ideally if you know what genome those proteins are from then you could go and get the DNA sequence from source instead.

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax39k

The genome is from Listeria monocytogenes, and I have the fasta file for a contig of this genome, but for some reason it recognizes it as a protein and not nucleotide file. Even though the first couple lines are: AGATTCCTTGCGTCAAATTGACTTCGCTAGCAATTAAATTACTAGTTTGTTTTGTTGAAAACAGCTTTCT GTTTTCTGCCCTGCGATTACCAGTGAGACTTTACGTCTCATTGCTTTTCGTCTTCTTCTTTGTTCAGTTT TCAAAGGTCAGTTGCTTTGTTAACGCAACTTTTAAATCTTACCATAAAGTTGAAATCACGTCAACAACTA

ADD REPLYlink written 9 months ago by AJTrunkskun940
1

That is definitely not protein and if you started at the top of the file it is not in fasta format either.

but for some reason it recognizes it as a protein and not nucleotide file.

What is doing that? Blast? Have you created indexes for this dataset already?

ADD REPLYlink modified 9 months ago • written 9 months ago by genomax39k

So when I made the databases by makeblastdb -dbtype prot -in LM_R8_5081_contig10.fasta -out LM_R8_5081_contig10 -parse_seqids Then the report after says that it is a protein: New DB title: LM_R8_5081_contig10.fasta Sequence type: Protein Deleted existing Protein BLAST database named /home/ajt3/Listeria_Work/LM_R8_5081_contig10_test Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B

ADD REPLYlink written 9 months ago by AJTrunkskun940

Nope, I'm dumb, I realized what it is...

ADD REPLYlink written 9 months ago by AJTrunkskun940

Glad you figured the problem out yourself :)

ADD REPLYlink written 9 months ago by genomax39k

realized that my databases are now protein databases, even though the fasta file is ALL nucleotides.

It sounds like the database you have, is a nucleotide fasta, but when you made the db, it was made as a protein database, so the extensions are incorrect for the indexed files.

Do your db files end with the following?

.phr, .pin, .psq

remake your database with this command:

makeblastdb -in yourfastfile.fasta -dbtype nucl

and your db will be made with the correct extensions for blastn. Your files will not end with

.nhr, .nin, .nsq

ADD REPLYlink modified 9 months ago • written 9 months ago by st.ph.n1.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1425 users visited in the last hour