I am a beginner with Blast+.I am using Windows.My aim as of now is to download the nr protein sequence in Fasta format and then format it using makeblastdb.then extract the first 1000 characters from the nr file as a seperate file (say qa.fasta) and then query it against the whole database.
Now i downloaded the nr database in Fasta format from this link
Hi, first, I'm not sure "original" is the good term, but if you mean: "do these fasta files correspond exactly to the official nr db sequences?" the answer is yes. Second, the fact the db files are splitted is a normal behavior. Nevertheless, I have a doubt the db building process worked until the end: personally, I 've never tried on nr but NCBI provides the nr ready-to-go blastdb that iterates until nr.05. . Do you have the alias file (nr.pal) created?
Finally, as Geparada told you, fasta files are text files. So open it with any text editor (better than processor BTW, you don't want any grammar correction, or a Times New Roman font for ids and Arial Italic for sequences, and more importantly, you want to save your first 1000 aa as text, not doc, rtf... ). The difficulty is actually not the type of file, but the size. I've never tried on windows, but a former coworker used Notepad++ and seemed to be happy with this one.
If you want to stick with Windows, use gvim, or something like it for Windows. It's more powerful than a Notepad, it has no problem handling very large text files (and I think it's easier on the eyes than Notepad)
I did not get why you didn't directly downloaded the preformatted databases from ncbi in the first place? You can blast against it directly and literally get any info from it using the provided utilities. Even on winhoo$.
At best try to use an editor that can handle line-endings conversion (they are different for windhoos en unix and some tools will fail with incorrect line endings. Not all windows-2-unix convert these accuratly. I personally prefere notepad++ where you can interconvert line endings as well).
Why do you need the first 1000 char? Why did you put bioperl in the tags?
I've removed the bioperl tag.