Entering edit mode
6.6 years ago
sukesh1411 ▴ 30
It might be very simple question, but i could not convert the text file which has sequences in below format.. to .fasta file??
>gi|4|emb|X17276.1| Giant Panda satellite 1 DNA GATCCTCCCCAGGCCCCTACACCCAATGTGGAACCGGGGTCCCGAATGAAAATGCTGCTGTTCCCTGGAGGTGTTTTCCT GGACGCTCTGCTTTGTTACCAATGAGAAGGGCGCTGAATCCTCGAAAATCCTGACCCTTTTAATTCATGCTCCCTTACTC ACGAGAGATGATGATCGTTGATATTTCCCTGGACTGTGTGGGGTCTCAGAGACCACTATGGGGCACTCTCGTCAGGCTTC CGCGACCACGTTCCCTCATGTTTCCCTATTAACGAAGGGTGATGATAGTGCTAAGACGGTCCCTGTACGGTGTTGTTTCT GACAGACGTGTTTTGGGCCTTTTCGTTCCATTGCCGCCAGCAGTTTTGACAGGATTTCCCCAGGGAGCAAACTTTTCGAT GGAAACGGGTTTTGGCCGAATTGTCTTTCTCAGTGCTGTGTTCGTCGTGTTTCACTCACGGTACCAAAACACCTTGATTA TTGTTCCACCCTCCATAAGGCCGTCGTGACTTCAAGGGCTTTCCCCTCAAACTTTGTTTCTTGGTTCTACGGGCTG >gi|7|emb|X51700.1| Bos taurus mRNA for bone Gla protein GTCCACGCAGCCGCTGACAGACACACCATGAGAACCCCCATGCTGCTCGCCCTGCTGGCCCTGGCCACACTCTGCCTCGC TGGCCGGGCAGATGCAAAGCCTGGTGATGCAGAGTCGGGCAAAGGCGCAGCCTTCGTGTCCAAGCAGGAGGGCAGCGAGG TGGTGAAGAGACTCAGGCGCTACCTGGACCACTGGCTGGGAGCCCCAGCCCCCTACCCAGATCCGCTGGAGCCCAAGAGG GAGGTGTGTGAGCTCAACCCTGACTGTGACGAGCTAGCTGACCACATCGGCTTCCAGGAAGCCTATCGGCGCTTCTACGG CCCAGTCTAGAGCTTGCAGCCCTGCCCACCTGGCTGGCAGCCCCCAGCTCTGGCTTCTCTCCAGGACCCCTCCCCTCCCC GTCATCCCCGCTGCTCTAGAATAAACTCCAGAAGAGG
Need to add
>before identifier, in this case "gi|4|emb|X17276.1| Giant Panda satellite 1 DNA" and "gi|7|emb|X51700.1| Bos taurus mRNA for bone Gla protein". Header and sequence should be on separate lines.
Thank you ... The ">" is already there in the file. Can i use the above command to execute it. Will it just replace the symbol ">" or it adds to the existing file?
If its already there, then
its already a fasta file. Whats the problem then ? You just rename your file from text to .fasta
The above command does not work. If you want to replace space with "_",
Thank you. I used the above command for the complete nucleotide database it suddenly stops saying segmentation error core dumped. Is there any any other way i can do it??
The '>' gets auto-formatted on Biostars. So OP probably posted it.
The file you posted looks like a fasta already. How did you try to convert and what makes you think it didn't work?
I need to generate index file for the blast nt file to do blast search. I used the makeblastdb command to generate index file. I got an error during this process saying duplicate seqIds. To remove duplicate sequences in the nt file i tried with uclust. I could not do uclust since the nt file is in text but not in fasta format..
Is nt = nucleotide file or nt as in NCBI NT file?
If it is the first then do this
That should show you what ID's are duplicated. Edit the file to remove those duplicates.
It is NCBI NT file. Its a big file. If i can split this file into atleast two files i think i can remove the duplicates
Why are you creating your own indexes when you can download the pre-formatted from NCBI directly?
I downloaded all 39 NT files and extracted. When i run blast below command
blastn -query contigs.fa -db ntdb -outfmt 6 > known_sequences.blastx.nt.hits.txt
I got the below error.
BLAST Database error: No alias or index file found for nucleotide database [ntdb/] in search path
How can i solve this
You can't makeup your own blast db name. use
-db nt(with full path if needed).
thank you :) it worked