Question: Opening A Fasta File In Windows
0
gravatar for Vivek
7.0 years ago by
Vivek0
Vivek0 wrote:

Hi all,

I am a beginner with Blast+.I am using Windows.My aim as of now is to download the nr protein sequence in Fasta format and then format it using makeblastdb.then extract the first 1000 characters from the nr file as a seperate file (say qa.fasta) and then query it against the whole database.

Now i downloaded the nr database in Fasta format from this link

ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz (are these the original fasta files??)

then i used to makeblastdb command like this

makeblastdb -in nr -dbtype prot -out outnr -> This resulted in the nr file to be split into different parts nr.00 to nr.03.(Is this normal).

Now i need help to extract the first 1000 char from nr file.But how to i open a Fasta file in windows??? How do i proceed??

fasta makeblastdb blast • 24k views
ADD COMMENTlink modified 7.0 years ago by Vivek0 • written 7.0 years ago by Vivek0

Why do you need the first 1000 char? Why did you put bioperl in the tags?

ADD REPLYlink written 7.0 years ago by Manu Prestat3.9k

I've removed the bioperl tag.

ADD REPLYlink written 7.0 years ago by Neilfws48k
2
gravatar for Geparada
7.0 years ago by
Geparada1.4k
Cambridge
Geparada1.4k wrote:

fasta are plain texts files, you can open with notepad or even word.

If you'll often do this kind of stuff, you should use unix. The life is too short to use windows.

ADD COMMENTlink written 7.0 years ago by Geparada1.4k
1

In the long term switching to using a UNIX style system may make sense. However there is a learning curve to take into account... I suggest trying a biology targeted Linux distribution, see http://en.wikipedia.org/wiki/BioLinux, in a virtual machine, for example using VirtualBox (https://www.virtualbox.org/) as a starting point.

ADD REPLYlink written 7.0 years ago by Hamish3.1k
2
gravatar for Manu Prestat
7.0 years ago by
Manu Prestat3.9k
Marseille, France
Manu Prestat3.9k wrote:

Hi, first, I'm not sure "original" is the good term, but if you mean: "do these fasta files correspond exactly to the official nr db sequences?" the answer is yes. Second, the fact the db files are splitted is a normal behavior. Nevertheless, I have a doubt the db building process worked until the end: personally, I 've never tried on nr but NCBI provides the nr ready-to-go blastdb that iterates until nr.05. . Do you have the alias file (nr.pal) created? Finally, as Geparada told you, fasta files are text files. So open it with any text editor (better than processor BTW, you don't want any grammar correction, or a Times New Roman font for ids and Arial Italic for sequences, and more importantly, you want to save your first 1000 aa as text, not doc, rtf... ). The difficulty is actually not the type of file, but the size. I've never tried on windows, but a former coworker used Notepad++ and seemed to be happy with this one.

ADD COMMENTlink written 7.0 years ago by Manu Prestat3.9k

The 'nr' BLAST database from NCBI contains additional information not present in the fasta sequence format data, since it is generated from the ASN.1. In order to ensure maximum compatibility it is likely a smaller part size is also used by NCBI, this avaoids problems with some filesystems. So it isn't surprising that a manual generation would give fewer parts.

ADD REPLYlink written 7.0 years ago by Hamish3.1k

See http://en.wikipedia.org/wiki/List_of_text_editors for a list of text editors, many of which are available for MS Windows. You may find reading http://en.wikipedia.org/wiki/Text_editor helpful since it contains a definition of a text editor.

ADD REPLYlink written 7.0 years ago by Hamish3.1k
1
gravatar for Swbarnes2
7.0 years ago by
Swbarnes21.4k
Swbarnes21.4k wrote:

If you want to stick with Windows, use gvim, or something like it for Windows. It's more powerful than a Notepad, it has no problem handling very large text files (and I think it's easier on the eyes than Notepad)

ADD COMMENTlink written 7.0 years ago by Swbarnes21.4k

+1. And also Windows/OSX native text editors all treat some characters (whitespace) a bit differently. Linebreak is 'n' in unix, but r in osx for example.

ADD REPLYlink written 7.0 years ago by Damian Kao15k
0
gravatar for ALchEmiXt
7.0 years ago by
ALchEmiXt1.9k
The Netherlands
ALchEmiXt1.9k wrote:

I did not get why you didn't directly downloaded the preformatted databases from ncbi in the first place? You can blast against it directly and literally get any info from it using the provided utilities. Even on winhoo$.

At best try to use an editor that can handle line-endings conversion (they are different for windhoos en unix and some tools will fail with incorrect line endings. Not all windows-2-unix convert these accuratly. I personally prefere notepad++ where you can interconvert line endings as well).

ADD COMMENTlink written 7.0 years ago by ALchEmiXt1.9k
0
gravatar for Biomonika (Noolean)
7.0 years ago by
State College, PA, USA
Biomonika (Noolean)3.0k wrote:

When opening large fasta files, I have been more than satisfied with JWrite. All other editors used to crash from time to time, especially when handling really large datasets.

ADD COMMENTlink written 7.0 years ago by Biomonika (Noolean)3.0k
0
gravatar for Vivek
7.0 years ago by
Vivek0
Vivek0 wrote:

Hi all,

Thanks for the replies.Apologies for being late to get back.

I am working on a research project with my professor.Thats y i downloaded the fasta files as i was asked to do so :)

The file is too big to be opened by windows (by any editor) and hence i need to extract the first 1000 chars just to take one sequence so that i can do a blast using a test query.

Manu Prestat - Yes i have the nr.pal file created.

ADD COMMENTlink written 7.0 years ago by Vivek0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1344 users visited in the last hour