Question: FASTA file into excel file format?
1
gravatar for Azhar
3.5 years ago by
Azhar50
China
Azhar50 wrote:

Hi, I have a NCBI fast seq file like

    >mrna1
    gctatatagactgatagctag
    >mrna2
    acgaggctagcggattg

for whole Human genome and i want to convert it into excel file format for convenience in analysis or How this can be used in sql format to extract data

Please suggest any method or tool

rna-seq • 7.6k views
ADD COMMENTlink modified 3.5 years ago by Daniel3.8k • written 3.5 years ago by Azhar50
23

and i want to convert it into excel file format for convenience in analysis

Munch

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Pierre Lindenbaum134k
1

I cannot improve this answer by one bit :D

ADD REPLYlink written 3.5 years ago by Macspider3.3k
8

Don't do it!

enter image description here

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by Devon Ryan98k
3

Why do you want to do such a thing. This is a bad idea. There must be another way. What's the bit about sql ? Do you mean you want to put the content of the FASTA file into a relational database ? Any scripting language can read and parse a FASTA file and output it into a database. There's no need for Excel in this. Also I am not aware of sequence analysis tools that are more convenient to use when the data is in Excel and even if such tools existed, I would think twice before using them because they were most likely developed by people who're not in the business of analyzing sequences. Excel is not designed to deal with sequences. It's the wrong tool for the task.

ADD REPLYlink written 3.5 years ago by Jean-Karim Heriche24k
1

You want to convert fasta file to excel file? So you need to do analysis in windows? If you use python , it is very easy to convert to excel format file using the "Openpyxl" package. Such as the column 1 is the id, the column 2 is the short sequence. But if the file is huge, I think it is not a good idea to save sequence to a excel file.

ADD REPLYlink written 3.5 years ago by Sparrow_kop230

But why.

enter image description here

ADD REPLYlink written 3.5 years ago by Macspider3.3k
2
gravatar for Daniel
3.5 years ago by
Daniel3.8k
Cardiff University
Daniel3.8k wrote:

There are lots of jokes above, but the honest answer is that you do not want to do this. Once you have converted the file it would be so massive that it would impossible to open and use in excel before any other problems. Further, this is not an appropriate way to read the data, and 99.9% of the time you will not want to just look at the raw sequence anyway, but you will want to search or compare using a specialised program.

You will be much happier by keeping the data in fasta format and having a search for a bioinformatics program that does what you want and understands fasta (all of them will do this).

ADD COMMENTlink written 3.5 years ago by Daniel3.8k
1
gravatar for cpad0112
3.5 years ago by
cpad011215k
Hyderabad India
cpad011215k wrote:

Instead of xls file, convert it into tab separated file which can be easily loaded into any RDBMS including MSaccess, mysql, postgresql etc etc.

Download seqkit from here for windows (both 32 and 64 bit are available) and run following command on fasta file:

$ seqkit fx2tab test.fa

output would be:

mrna1   gctatatagactgatagctag   
mrna2   acgaggctagcggattg

You can change the output extension to xls though it is a tab separated file. some thing like this:

$ seqkit fx2tab test.fa > output.xls
ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by cpad011215k

Not sure you can just change the extension to .xls, since .xls is some kind of binary file with more than just ASCII text.

You can also use pyfaidx to make a "transposed" FASTA file:

faidx --transform transposed tests/data/genes.fasta
AB821309.1  1       3510    ATGGTCAGCTGGGGTCGTTTCATC...
KF435150.1  1       481     ATGACATCATTTTCCACCTCTGCT...
KF435149.1  1       642     ATGACATCATTTTCCACCTCTGCT...
NR_104216.1 1       4573    CCCCGCCCCTCTGGCGGCCCGCCG...

This incorporates some if the information from the FASTA .fai index file, such as sequence start and end coordinates (which is just the sequence length).

ADD REPLYlink written 3.5 years ago by Matt Shirley9.5k

xls format/file per se binary and proprietary. Purpose of appending .xls extension to open this tab separated file with default application, Excel for xls files on windows. Since Excel can parse tab separated file (and csv as well), it will not report any errors. The extension could be .tsv (or tab), but user has to launch Excel application and then import tsv which would be same as opening a tab separated file with .xls extension.

ADD REPLYlink modified 3.5 years ago • written 3.5 years ago by cpad011215k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2117 users visited in the last hour
_