Entering edit mode
7.1 years ago
AJTrunkskun94
•
0
Hi! I'm fairly new at UNIX and Bioinformatical work. I am taking a class now that is focusing on using UNIX and BLAST on nucleotide sequences. When the databases were made, we have been using blastn, constantly. I wanted to apply some of what I learned to my own research and realized that my databases are now protein databases, even though the fasta file is ALL nucleotides. Is there a way to convert the database from protein to nucleotide?
Due to redundancy in codon usage for multiple amino acids a protein sequence does not uniquely identify a nucleotide sequence.
If you only have the index files (i.e. no fasta protein sequence) then you would need to use
blastdbcmd
utility to first recover the fasta sequence.Then you can use back-translation tools like
backtranseq
from EMBOSS (http://www.ebi.ac.uk/Tools/st/). Ideally if you know what genome those proteins are from then you could go and get the DNA sequence from source instead.The genome is from Listeria monocytogenes, and I have the fasta file for a contig of this genome, but for some reason it recognizes it as a protein and not nucleotide file. Even though the first couple lines are: AGATTCCTTGCGTCAAATTGACTTCGCTAGCAATTAAATTACTAGTTTGTTTTGTTGAAAACAGCTTTCT GTTTTCTGCCCTGCGATTACCAGTGAGACTTTACGTCTCATTGCTTTTCGTCTTCTTCTTTGTTCAGTTT TCAAAGGTCAGTTGCTTTGTTAACGCAACTTTTAAATCTTACCATAAAGTTGAAATCACGTCAACAACTA
That is definitely not protein and if you started at the top of the file it is not in fasta format either.
What is doing that? Blast? Have you created indexes for this dataset already?
So when I made the databases by
makeblastdb -dbtype prot -in LM_R8_5081_contig10.fasta -out LM_R8_5081_contig10 -parse_seqids
Then the report after says that it is a protein:New DB title: LM_R8_5081_contig10.fasta Sequence type: Protein Deleted existing Protein BLAST database named /home/ajt3/Listeria_Work/LM_R8_5081_contig10_test Keep Linkouts: T Keep MBits: T Maximum file size: 1000000000B
Nope, I'm dumb, I realized what it is...
Glad you figured the problem out yourself :)
It sounds like the database you have, is a nucleotide fasta, but when you made the db, it was made as a protein database, so the extensions are incorrect for the indexed files.
Do your db files end with the following?
remake your database with this command:
and your db will be made with the correct extensions for blastn. Your files will not end with