FASTA file into excel file format?
2
1
Entering edit mode
6.6 years ago
Azhar ▴ 50

Hi, I have a NCBI fast seq file like

    >mrna1
    gctatatagactgatagctag
    >mrna2
    acgaggctagcggattg

for whole Human genome and i want to convert it into excel file format for convenience in analysis or How this can be used in sql format to extract data

Please suggest any method or tool

RNA-Seq • 15k views
ADD COMMENT
23
Entering edit mode

and i want to convert it into excel file format for convenience in analysis

Munch

ADD REPLY
1
Entering edit mode

I cannot improve this answer by one bit :D

ADD REPLY
8
Entering edit mode

Don't do it!

enter image description here

ADD REPLY
3
Entering edit mode

Why do you want to do such a thing. This is a bad idea. There must be another way. What's the bit about sql ? Do you mean you want to put the content of the FASTA file into a relational database ? Any scripting language can read and parse a FASTA file and output it into a database. There's no need for Excel in this. Also I am not aware of sequence analysis tools that are more convenient to use when the data is in Excel and even if such tools existed, I would think twice before using them because they were most likely developed by people who're not in the business of analyzing sequences. Excel is not designed to deal with sequences. It's the wrong tool for the task.

ADD REPLY
1
Entering edit mode

You want to convert fasta file to excel file? So you need to do analysis in windows? If you use python , it is very easy to convert to excel format file using the "Openpyxl" package. Such as the column 1 is the id, the column 2 is the short sequence. But if the file is huge, I think it is not a good idea to save sequence to a excel file.

ADD REPLY
0
Entering edit mode

But why.

enter image description here

ADD REPLY
2
Entering edit mode
6.6 years ago

Instead of xls file, convert it into tab separated file which can be easily loaded into any RDBMS including MSaccess, mysql, postgresql etc etc.

Download seqkit from here for windows (both 32 and 64 bit are available) and run following command on fasta file:

$ seqkit fx2tab test.fa

output would be:

mrna1   gctatatagactgatagctag   
mrna2   acgaggctagcggattg

You can change the output extension to xls though it is a tab separated file. some thing like this:

$ seqkit fx2tab test.fa > output.xls
ADD COMMENT
0
Entering edit mode

Not sure you can just change the extension to .xls, since .xls is some kind of binary file with more than just ASCII text.

You can also use pyfaidx to make a "transposed" FASTA file:

faidx --transform transposed tests/data/genes.fasta
AB821309.1  1       3510    ATGGTCAGCTGGGGTCGTTTCATC...
KF435150.1  1       481     ATGACATCATTTTCCACCTCTGCT...
KF435149.1  1       642     ATGACATCATTTTCCACCTCTGCT...
NR_104216.1 1       4573    CCCCGCCCCTCTGGCGGCCCGCCG...

This incorporates some if the information from the FASTA .fai index file, such as sequence start and end coordinates (which is just the sequence length).

ADD REPLY
0
Entering edit mode

xls format/file per se binary and proprietary. Purpose of appending .xls extension to open this tab separated file with default application, Excel for xls files on windows. Since Excel can parse tab separated file (and csv as well), it will not report any errors. The extension could be .tsv (or tab), but user has to launch Excel application and then import tsv which would be same as opening a tab separated file with .xls extension.

ADD REPLY
2
Entering edit mode
6.6 years ago
Daniel ★ 4.0k

There are lots of jokes above, but the honest answer is that you do not want to do this. Once you have converted the file it would be so massive that it would impossible to open and use in excel before any other problems. Further, this is not an appropriate way to read the data, and 99.9% of the time you will not want to just look at the raw sequence anyway, but you will want to search or compare using a specialised program.

You will be much happier by keeping the data in fasta format and having a search for a bioinformatics program that does what you want and understands fasta (all of them will do this).

ADD COMMENT

Login before adding your answer.

Traffic: 2467 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6