Is There An Easy Way To Manually (Or Batch) Convert Fasta To Csv?
6
6
Entering edit mode
13.4 years ago
Blunders ★ 1.1k

SOLVED: Reading Fasta files into a custom database and either need to write a converter myself, or find one to export the data to CSV. Manually doing the conversion via an free application would be okay, a batch conversion would be better. It would be much better if the options run on Windows XP, but if not OSX/Linux, though all options must be be local, meaning I'm not going to upload the file to the an external system. The process to from install to output needs to be 5-10 to start, and then 10-20 seconds going forward.

Thanks, and if you have any questions, let me know.

fasta conversion • 14k views
ADD COMMENT
2
Entering edit mode

What have you tried so far ?

ADD REPLY
1
Entering edit mode

@Pierre Lindenbaum: Ended up just using Regex in TextPad. Problem is solved. Thanks!

ADD REPLY
12
Entering edit mode
13.4 years ago

One-line solution

ruby -e 'first_line = true; while line = STDIN.gets; line.chomp!; if line =~ /^>/; puts unless first_line; print line[1..-1]; print ","; else; print line; end; first_line = false; end; puts' < s001.fasta

Simple script

Here is a Ruby script that does this:

#!/usr/bin/ruby

first_line = true

while line = STDIN.gets
  line.chomp!

  if line =~ /^>/
    puts unless first_line
    print line[1..-1]
    print ","  # <-- Change this to "\t" and it's a convert-fasta-to-tab
  else
    print line
  end

  first_line = false
end
puts
  1. Just save it to a file.
  2. Name the file convert-fasta-to-csv
  3. to make it executable, run chmod +x ./convert-fasta-to-csv

Usage

./convert-fasta-to-csv < f001.fasta > f001.fasta.csv

To do it in batch run all .fasta files in current folder:

for i in *.fasta; do ./convert-fasta-to-csv < $i > $i.csv; done

System Requirements

You probably already have Ruby but it may not always be installed by default.

  • To install it on Ubuntu or Debian run: sudo apt-get install ruby
  • On RedHat or CentOS, run: sudo yum install ruby
  • On Windows, install from http://rubyinstaller.org/
  • On Mac, you already have it (it's a part of the operating system).
ADD COMMENT
1
Entering edit mode

I use it for systems administration and sometimes for Bioinformatics research. It very elegant and easy to read. Has all the power of Perl and Python. Don't use Ruby (Prel and Python alike) for all of your Bioinformatcs because there is R. The R language has the best set of high-quality and high-performance Bioinformatics libraries (BioConductor). R's vector driven environment is a much better chose for Bioinformatcs data analysis. Downside: R scripts that import libraries do take a long time to startup, so converters like this one are best in Ruby.

ADD REPLY
0
Entering edit mode

Instead of comma-separated, I usually do tab-separated. To do that, simply replace print "," to print "\t". And rename the script to convert-fasta-to-tab :)

ADD REPLY
0
Entering edit mode

@Aleksandr Levchuk: Just wondering, what sort of functions do you use Ruby for? Meaning normally people use Perl or Python, but I was thinking about using Ruby, but there's so little BioXYZ source code.

ADD REPLY
5
Entering edit mode
13.4 years ago
Mary 11k

I always thought the Scriptome project was a really cool idea, but it doesn't seem to have been updated in a while.

However, there might be some nuggets you could glean from this about how to structure a little script:

http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html

ADD COMMENT
0
Entering edit mode

@Mary: Yes, that is cool, since the Perl is there there too. Just used Regex in TextPad to do it, so selecting you as an answer, since I was also wondering if a web service like that was around and how it was doing. Thanks!

ADD REPLY
0
Entering edit mode

-1 the perl code fails, it gives syntax error at -e line 1, near "="

ADD REPLY
0
Entering edit mode

@Aleksandr Levchuk: Just wondering, since I may try to use that code at some point, what version of Perl are you running, or do you know why there's an error?

ADD REPLY
0
Entering edit mode

@blunders I tested on v5.8 and v5.10 (those are default on Debain 4, 5, 6, and Ubuntu 10.10). Most likely, the Perl code was written for an earlier version and the interpreter is not backwards compatible.

ADD REPLY
0
Entering edit mode

@Aleksandr Levchuk: Wow, really -- what's the point then, meaning that mean all the BioXYZ code that's in use either needs to use an old Perl engine, or the code base needs to be updated, right?

ADD REPLY
0
Entering edit mode

@blunders Yes, I would not mark this as a good answer because people with the same question who visit BioStar (by Googling, etc...) will run into a trap.

ADD REPLY
5
Entering edit mode
13.4 years ago

Use your text editor's regexp search-and-replace feature.

Vim

In Vim, press Esc (normal mode) and paste the following sequence of commands:

:0,$s/>\(.*\)\n/>\1,/
:0,$s/\(.*\)\n\([^>]\)/\1\2/
:0,$s/^>//

And press Enter.

One-line Solution

input_file=001.fasta; vim -c '0,$s/>\(.*\)\n/>\1,/' -c '0,$s/\(.*\)\n\([^>]\)/\1\2/' -c 'w! alex-tmp.fasta.csv' -c 'q!'  $input_file; mv alex-tmp.fasta.csv $input_file.csv
ADD COMMENT
4
Entering edit mode
13.4 years ago

Have a look at BioSql and the howto on bioperl for Open Biological Data Access for which it is a subset of. It does take a bit of figuring out, but once you do the biogetseq.pl script is very useful and can load both local and remote data sources.

ADD COMMENT
0
Entering edit mode

@Alastair Kerr: Thanks, I'll take a look at it, since Perl currently appears to be the target language of choice within the company I'm at.

ADD REPLY
2
Entering edit mode
13.3 years ago
Tim Rayner ▴ 20

In the spirit of other one-line solutions, here's the (almost) inevitable perl version:

perl -e '$/=">"; while(<STDIN>) { next if length == 1; @x=split /\n/; printf "$x[0],$x[1]\n" } ' < your_sequence_file.fasta

I imagine it could be shorter still...

ADD COMMENT
0
Entering edit mode

Can any kind coder tweak this for me so the output is to a file? Thanks <3

ADD REPLY
0
Entering edit mode
12.8 years ago
Radu ▴ 50

Late to the party but here is my implementation in C#:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace ConvertFASTA
{
    class fasta
    {
        static void Main(string[] args)
        {
            StreamWriter writer = new StreamWriter(args[1]);

            using (StreamReader reader = new StreamReader(args[0]))
            {
                string temp= "", line= "";
                bool firstLine = true;

                while ((line = reader.ReadLine()) != null)
                {
                    line = line.Replace("|", ",");
                    if (line.Substring(0, 1) != ">")
                    {
                        temp = line;
                    }
                    else
                    {
                        if (firstLine) temp = line + ",";
                        else temp = "\n" + line + ",";
                    }

                    firstLine = false;
                    writer.Write(temp,true);
                }
            }

            writer.Dispose();
            writer.Close();
        }
    }
}

Compile and use like this

fasta.exe input.fa output
ADD COMMENT

Login before adding your answer.

Traffic: 2453 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6