Is There An Easy Way To Manually (Or Batch) Convert Fasta To Csv?
6
6
Entering edit mode
10.9 years ago
Blunders ★ 1.1k

SOLVED: Reading Fasta files into a custom database and either need to write a converter myself, or find one to export the data to CSV. Manually doing the conversion via an free application would be okay, a batch conversion would be better. It would be much better if the options run on Windows XP, but if not OSX/Linux, though all options must be be local, meaning I'm not going to upload the file to the an external system. The process to from install to output needs to be 5-10 to start, and then 10-20 seconds going forward.

Thanks, and if you have any questions, let me know.

fasta conversion • 10.0k views
2
Entering edit mode

What have you tried so far ?

1
Entering edit mode

@Pierre Lindenbaum: Ended up just using Regex in TextPad. Problem is solved. Thanks!

12
Entering edit mode
10.9 years ago

### One-line solution

ruby -e 'first_line = true; while line = STDIN.gets; line.chomp!; if line =~ /^>/; puts unless first_line; print line[1..-1]; print ","; else; print line; end; first_line = false; end; puts' < s001.fasta


### Simple script

Here is a Ruby script that does this:

#!/usr/bin/ruby

first_line = true

while line = STDIN.gets
line.chomp!

if line =~ /^>/
puts unless first_line
print line[1..-1]
print ","  # <-- Change this to "\t" and it's a convert-fasta-to-tab
else
print line
end

first_line = false
end
puts

1. Just save it to a file.
2. Name the file convert-fasta-to-csv
3. to make it executable, run chmod +x ./convert-fasta-to-csv

### Usage

./convert-fasta-to-csv < f001.fasta > f001.fasta.csv


To do it in batch run all .fasta files in current folder:

for i in *.fasta; do ./convert-fasta-to-csv < $i >$i.csv; done


### System Requirements

You probably already have Ruby but it may not always be installed by default.

• To install it on Ubuntu or Debian run: sudo apt-get install ruby
• On RedHat or CentOS, run: sudo yum install ruby
• On Windows, install from http://rubyinstaller.org/
• On Mac, you already have it (it's a part of the operating system).
1
Entering edit mode

I use it for systems administration and sometimes for Bioinformatics research. It very elegant and easy to read. Has all the power of Perl and Python. Don't use Ruby (Prel and Python alike) for all of your Bioinformatcs because there is R. The R language has the best set of high-quality and high-performance Bioinformatics libraries (BioConductor). R's vector driven environment is a much better chose for Bioinformatcs data analysis. Downside: R scripts that import libraries do take a long time to startup, so converters like this one are best in Ruby.

0
Entering edit mode

Instead of comma-separated, I usually do tab-separated. To do that, simply replace print "," to print "\t". And rename the script to convert-fasta-to-tab :)

0
Entering edit mode

@Aleksandr Levchuk: Just wondering, what sort of functions do you use Ruby for? Meaning normally people use Perl or Python, but I was thinking about using Ruby, but there's so little BioXYZ source code.

5
Entering edit mode
10.9 years ago
Mary 11k

I always thought the Scriptome project was a really cool idea, but it doesn't seem to have been updated in a while.

However, there might be some nuggets you could glean from this about how to structure a little script:

http://sysbio.harvard.edu/csb/resources/computational/scriptome/Windows/Tools/Change.html

0
Entering edit mode

@Mary: Yes, that is cool, since the Perl is there there too. Just used Regex in TextPad to do it, so selecting you as an answer, since I was also wondering if a web service like that was around and how it was doing. Thanks!

0
Entering edit mode

-1 the perl code fails, it gives syntax error at -e line 1, near "="

0
Entering edit mode

@Aleksandr Levchuk: Just wondering, since I may try to use that code at some point, what version of Perl are you running, or do you know why there's an error?

0
Entering edit mode

@blunders I tested on v5.8 and v5.10 (those are default on Debain 4, 5, 6, and Ubuntu 10.10). Most likely, the Perl code was written for an earlier version and the interpreter is not backwards compatible.

0
Entering edit mode

@Aleksandr Levchuk: Wow, really -- what's the point then, meaning that mean all the BioXYZ code that's in use either needs to use an old Perl engine, or the code base needs to be updated, right?

0
Entering edit mode

@blunders Yes, I would not mark this as a good answer because people with the same question who visit BioStar (by Googling, etc...) will run into a trap.

5
Entering edit mode
10.9 years ago

Use your text editor's regexp search-and-replace feature.

### Vim

In Vim, press Esc (normal mode) and paste the following sequence of commands:

:0,$s/>$$.*$$\n/>\1,/ :0,$s/$$.*$$\n$$[^>]$$/\1\2/
:0,$s/^>//  And press Enter. ### One-line Solution input_file=001.fasta; vim -c '0,$s/>$$.*$$\n/>\1,/' -c '0,$s/$$.*$$\n$$[^>]$$/\1\2/' -c 'w! alex-tmp.fasta.csv' -c 'q!'$input_file; mv alex-tmp.fasta.csv $input_file.csv  ADD COMMENT 4 Entering edit mode 10.9 years ago Have a look at BioSql and the howto on bioperl for Open Biological Data Access for which it is a subset of. It does take a bit of figuring out, but once you do the biogetseq.pl script is very useful and can load both local and remote data sources. ADD COMMENT 0 Entering edit mode @Alastair Kerr: Thanks, I'll take a look at it, since Perl currently appears to be the target language of choice within the company I'm at. ADD REPLY 2 Entering edit mode 10.8 years ago Tim Rayner ▴ 20 In the spirit of other one-line solutions, here's the (almost) inevitable perl version: perl -e '$/=">"; while(<STDIN>) { next if length == 1; @x=split /\n/; printf "$x[0],$x[1]\n" } ' < your_sequence_file.fasta


I imagine it could be shorter still...

0
Entering edit mode

Can any kind coder tweak this for me so the output is to a file? Thanks <3

0
Entering edit mode
10.3 years ago

Late to the party but here is my implementation in C#:

using System;
using System.Collections.Generic;
using System.Text;
using System.IO;

namespace ConvertFASTA
{
class fasta
{
static void Main(string[] args)
{
StreamWriter writer = new StreamWriter(args[1]);

{
string temp= "", line= "";
bool firstLine = true;

{
line = line.Replace("|", ",");
if (line.Substring(0, 1) != ">")
{
temp = line;
}
else
{
if (firstLine) temp = line + ",";
else temp = "\n" + line + ",";
}

firstLine = false;
writer.Write(temp,true);
}
}

writer.Dispose();
writer.Close();
}
}
}


Compile and use like this

fasta.exe input.fa output