Hi Biostars community! I searched for this error in blastdbcmd here and just did not found. The fact is I am using this command to retrieve a set of proteins that are returned to me when I do a previous BLASTp operation. Fine. I just get all the IDs that are given to me, put them in a .file document (which I think it works as a .txt file, someone correct me if I am wrong) and then I give this file as an argument to the -entry_batch parameter of blastdbcmd. The program works, but the problem is that I am getting this type of FASTA file:
>XP_009183870.1 U4/U6.U5 tri-snRNP-associated protein 1 isoform X2 [Papio anubis] >XP_011719200.1 U4/U6.U5 tri-snRNP-associated protein 1 isoform X2 [Macaca nemestrina] >XP_011820612.1 PREDICTED: U4/U6.U5 tri-snRNP-associated protein 1 [Mandrillus leucophaeus] >XP_015289917.1 PREDICTED: U4/U6.U5 tri-snRNP-associated protein 1 isoform X2 [Macaca fascicularis]
MALRQREELREKLAAAKEKRLLNQKLGKIKTLGEDDPWLDDTAAWIERSRQLQKEKDLAEKRAK...
. (I will not type the rest for readability)
where you can see a clear mistake in the FASTA header: there are FOUR '>' when we know that when writing FASTA headers it is recommended to not do it. Also you can perceive that there are FOUR FASTA header with FOUR FASTA names for different species . The last is the one I really want (with the id that BLAST has given me). But the others... I really do not know where they came from. And this is messing with downstream analysis I am trying to do.
Please, if you know how to correct this, inform me. Thanks!
Hello! Little bit late but I tested your suggestion and it works... but not the way I want. efetch works only via the web as long as I know and tested. I really don not want to rely on web in this task I am performing. Open for any idea or suggestion. Thank you!
I linked the unix command line version for Entrez Utilities in my answer above. Answer I posted above came from that utility.