Question: Is There An Easy Way To Manually (Or Batch) Convert Fasta To Csv?
6
gravatar for Blunders
8.0 years ago by
Blunders1.1k
Blunders1.1k wrote:

SOLVED: Reading Fasta files into a custom database and either need to write a converter myself, or find one to export the data to CSV. Manually doing the conversion via an free application would be okay, a batch conversion would be better. It would be much better if the options run on Windows XP, but if not OSX/Linux, though all options must be be local, meaning I'm not going to upload the file to the an external system. The process to from install to output needs to be 5-10 to start, and then 10-20 seconds going forward.

Thanks, and if you have any questions, let me know.

fasta conversion • 7.6k views
ADD COMMENTlink modified 8.0 years ago by Radu50 • written 8.0 years ago by Blunders1.1k
2

What have you tried so far ?

ADD REPLYlink written 8.0 years ago by Pierre Lindenbaum114k
1

@Pierre Lindenbaum: Ended up just using Regex in TextPad. Problem is solved. Thanks!

ADD REPLYlink written 8.0 years ago by Blunders1.1k
12
gravatar for Aleksandr Levchuk
8.0 years ago by
United States
Aleksandr Levchuk3.1k wrote:

One-line solution

ruby -e 'first_line = true; while line = STDIN.gets; line.chomp!; if line =~ /^>/; puts unless first_line; print line[1..-1]; print ","; else; print line; end; first_line = false; end; puts' < s001.fasta

Simple script

Here is a Ruby script that does this:

#!/usr/bin/ruby

first_line = true

while line = STDIN.gets
  line.chomp!

  if line =~ /^>/
    puts unless first_line
    print line[1..-1]
    print ","  # <-- Change this to "\t" and it's a convert-fasta-to-tab
  else
    print line
  end

  first_line = false
end
puts
  1. Just save it to a file.
  2. Name the file convert-fasta-to-csv
  3. to make it executable, run chmod +x ./convert-fasta-to-csv

Usage

./convert-fasta-to-csv < f001.fasta > f001.fasta.csv

To do it in batch run all all ".fasta" files in current folder:

for i in *.fasta; do ./convert-fasta-to-csv < $i > $i.csv; done

System Requirements

You probably already have Ruby but it may not always be installed by default.

  • To install it on Ubuntu or Debian run: sudo apt-get install ruby
  • On RedHat or CentOS, run: sudo yum install ruby
  • On Windows, install form http://rubyinstaller.org/
  • On Mac, you already have it (it's a part of the operating system).
ADD COMMENTlink modified 7.8 years ago • written 8.0 years ago by Aleksandr Levchuk3.1k
1

I use it for systems administration and sometimes for Bioinformatics research. It very elegant and easy to read. Has all the power of Perl and Python. Don't use Ruby (Prel and Python alike) for all of your Bioinformatcs because there is R. The R language has the best set of high-quality and high-performance Bioinformatics libraries (BioConductor). R's vector driven environment is a much better chose for Bioinformatcs data analysis. Downside: R scripts that import libraries do take a long time to startup, so converters like this one are best in Ruby.

ADD REPLYlink written 8.0 years ago by Aleksandr Levchuk3.1k

Instead of comma-separated, I usually do tab-separated. To do that, simply replace print "," to print "\t". And rename the script to convert-fasta-to-tab :)

ADD REPLYlink written 8.0 years ago by Aleksandr Levchuk3.1k

@Aleksandr Levchuk: Just wondering, what sort of functions do you use Ruby for? Meaning normally people use Perl or Python, but I was thinking about using Ruby, but there's so little BioXYZ source code.

ADD REPLYlink written 8.0 years ago by Blunders1.1k
5
gravatar for Mary
8.0 years ago by
Mary11k
Boston MA area
Mary11k wrote:

I always thought the Scriptome project was a really cool idea, but it doesn't seem to have been updated in a while.

However, there might be some nuggets you could glean from this about how to structure a little script:

http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html

ADD COMMENTlink written 8.0 years ago by Mary11k

@Mary: Yes, that is cool, since the Perl is there there too. Just used Regex in TextPad to do it, so selecting you as an answer, since I was also wondering if a web service like that was around and how it was doing. Thanks!

ADD REPLYlink written 8.0 years ago by Blunders1.1k

-1 the perl code fails, it gives syntax error at -e line 1, near "="

ADD REPLYlink written 8.0 years ago by Aleksandr Levchuk3.1k

-1 the provided perl code fails, it gives syntax error at -e line 1, near "="

ADD REPLYlink written 8.0 years ago by Aleksandr Levchuk3.1k

@Aleksandr Levchuk: Just wondering, since I may try to use that code at some point, what version of Perl are you running, or do you know why there's an error?

ADD REPLYlink written 8.0 years ago by Blunders1.1k

@blunders I tested on v5.8 and v5.10 (those are default on Debain 4, 5, 6, and Ubuntu 10.10). Most likely, the Perl code was written for an earlier version and the interpreter is not backwards compatible.

ADD REPLYlink written 8.0 years ago by Aleksandr Levchuk3.1k

@Aleksandr Levchuk: Wow, really -- what's the point then, meaning that mean all the BioXYZ code that's in use either needs to use an old Perl engine, or the code base needs to be updated, right?

ADD REPLYlink written 8.0 years ago by Blunders1.1k

@blunders Yes, I would not mark this as a good answer because people with the same question who visit BioStar (by Googling, etc...) will run into a trap.

ADD REPLYlink written 8.0 years ago by Aleksandr Levchuk3.1k
5
gravatar for Aleksandr Levchuk
8.0 years ago by
United States
Aleksandr Levchuk3.1k wrote:

Use your text editor's regexp search-and-replace feature.

Vim

In Vim, press Esc (normal mode) and paste the following sequence of commands:

:0,$s/>\(.*\)\n/>\1,/
:0,$s/\(.*\)\n\([^>]\)/\1\2/
:0,$s/^>//

And press Enter.

One-line Solution

input_file=001.fasta; vim -c '0,$s/>\(.*\)\n/>\1,/' -c '0,$s/\(.*\)\n\([^>]\)/\1\2/' -c 'w! alex-tmp.fasta.csv' -c 'q!'  $input_file; mv alex-tmp.fasta.csv $input_file.csv
ADD COMMENTlink modified 8.0 years ago • written 8.0 years ago by Aleksandr Levchuk3.1k
4
gravatar for Alastair Kerr
8.0 years ago by
Alastair Kerr5.2k
The University of Edinburgh, UK
Alastair Kerr5.2k wrote:

Have a look at BioSql and the howto on bioperl for Open Biological Data Access for which it is a subset of. It does take a bit of figuring out, but once you do the biogetseq.pl script is very useful and can load both local and remote data sources.

ADD COMMENTlink written 8.0 years ago by Alastair Kerr5.2k

@Alastair Kerr: Thanks, I'll take a look at it, since Perl currently appears to be the target language of choice within the company I'm at.

ADD REPLYlink written 8.0 years ago by Blunders1.1k
2
gravatar for Tim Rayner
7.8 years ago by
Tim Rayner20
Cambridge
Tim Rayner20 wrote:

In the spirit of other one-line solutions, here's the (almost) inevitable perl version:

perl -e '$/=">"; while( <STDIN> ) { next if length == 1; @x=split /\n/; printf "$x[0],$x[1]\n" } ' < your_sequence_file.fasta

I imagine it could be shorter still...

ADD COMMENTlink written 7.8 years ago by Tim Rayner20

Can any kind coder tweak this for me so the output is to a file? Thanks <3

ADD REPLYlink written 6.5 years ago by poobearspam0
0
gravatar for Radu
7.3 years ago by
Radu50
Radu50 wrote:

Late to the party but here is my implementation in C#:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace ConvertFASTA
{
    class fasta
    {
        static void Main(string[] args)
        {
            StreamWriter writer = new StreamWriter(args[1]);

            using (StreamReader reader = new StreamReader(args[0]))
            {
                string temp= "", line= "";
                bool firstLine = true;

                while ((line = reader.ReadLine()) != null)
                {
                    line = line.Replace("|", ",");
                    if (line.Substring(0, 1) != ">")
                    {
                        temp = line;
                    }
                    else
                    {
                        if (firstLine) temp = line + ",";
                        else temp = "\n" + line + ",";
                    }

                    firstLine = false;
                    writer.Write(temp,true);
                }
            }

            writer.Dispose();
            writer.Close();
        }
    }
}

Compile and use like this fasta.exe input.fa output.

ADD COMMENTlink written 7.3 years ago by Radu50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1375 users visited in the last hour