Question: Using Ruby To Convert Csv File To Fasta
1
gravatar for User 7433
7.9 years ago by
User 7433150
User 7433150 wrote:

Okay so I posted on here earlier about converting my excel file full of DNA sequences into a FASTA file for analysis with DNAsp.

I currently have columns containing 1)chromosome number 2) the DNA sequence..and I want to get

>chromosomenumber
AGTAGAGATAGAGAGA....
>chromosome number
AGTCGCTCGAGAGTC...

so..I got a couple of responses which basically told me off for using excel and not exploring other options to do this!

I have now downloaded Ruby (?!) and I am trying to get to grips with it using the tutorial. I am aware of a script to convert CSV files to FASTA...eg see link below

http://biorelated.com/2011/01/26/converting-sequence-data-from-csv-to-fasta-format/#comment-328

As I am a newbie to Ruby I am confused as to where I put my file name in this script, and how I get Ruby to find my file!

Can any kind person out there please look at this script and perhaps highlight the bits that I need to edit in order to get it to work for me?

Please!

Thanks very much x

fasta • 5.5k views
ADD COMMENTlink modified 7.9 years ago by Rob Syme540 • written 7.9 years ago by User 7433150

I wasn't really telling you off specifically :) I was just illustrating how a bioinformatician with some coding expertise would look at the problem.

ADD REPLYlink written 7.9 years ago by Neilfws48k
5
gravatar for Neilfws
7.9 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

The script to which you link assumes that you are using a Linux-like operating system. So it has the lines:

csv_file    = "#{ENV['HOME']}/path_to_csv_file.csv"
fasta_file  = "#{ENV['HOME']}/path_to_fasta_file.fasta"

ENV['HOME'] is a special Ruby variable. If your user name was "bob", for example, then Ruby would interpret the lines above as:

/home/bob/path_to_csv_file.csv
/home/bob/path_to_fasta_file.fasta

But the person who wrote that script is using a shorthand. They do not mean that the files should start with "path_to_". They mean that you must specify the full path to the input CSV file and the output FASTA file. So if you want those files, for example, to be in /home/bob/projects/conversion, then you would write:

csv_file    = "#{ENV['HOME']}/projects/conversion/input.csv"
fasta_file  = "#{ENV['HOME']}/projects/conversion/output.fasta"

To make the script work, you would use a plain text editor to write the code, save it with a sensible name such as csv2fasta.rb and then run (in the same directory where you saved the Ruby script):

ruby csv2fasta.rb

And the file output.fasta should appear in /home/bob/projects/conversion.

Unfortunately if you are using Windows, none of the above applies because file paths are completely different. So it is probably best to try:

csv_file    = "input.csv"
fasta_file  = "output.fasta"

Then save the Ruby script in the same directory as input.csv and make sure that you run the script from that same directory.

But remember that Ruby is only one possible solution and that code you find on the web is not always the best code.

ADD COMMENTlink written 7.9 years ago by Neilfws48k

Thanks Neil for explaining the code! :)

ADD REPLYlink written 7.9 years ago by hadasa1.0k
4
gravatar for Rob Syme
7.9 years ago by
Rob Syme540
Perth, Western Australia
Rob Syme540 wrote:

Neil's comments are absolutely true and worth keeping in mind for the future.

However, if your csv file is just two columns [name,sequence] you probably doesn't need a whole script. If you're input file looks like this:

chrom_1,CATCGTAGCTAGTCGACTATGCTAGCTAGC
chrom_2,CTGATGCTAGCTACTGACTGACTGATCGATCTAGCTA
chrom_3,ATGCTGACTGATCGTACTGATCGTGACTGCTGAC

Then all you need to do (replace seqs.csv with your csv filename) is start a terminal session (Windows -> run and type "cmd"). Change into the directory that contains your sequences (use the "cd" command) and run:

ruby -ne 'puts ">" + $_.split(",").first(2).join("\n")' seqs.csv

This will give the output:

>chrom_1
CATCGTAGCTAGTCGACTATGCTAGCTAGC
>chrom_2
CTGATGCTAGCTACTGACTGACTGATCGATCTAGCTA
>chrom_3
ATGCTGACTGATCGTACTGATCGTGACTGCTGAC
ADD COMMENTlink modified 7.9 years ago • written 7.9 years ago by Rob Syme540
1

You should be opening a terminal, navigating to wherever your CSV file is and running Rob's command from there. Also, if you run the command "ruby -v" in a terminal, it should display the version. Tell us what messages/errors you see.

ADD REPLYlink written 7.9 years ago by Neilfws48k

Thank you both for posting..

I am operating on windows I'm afraid!

Rob - yep my input files are literally like that..I just have 4000 odd chromosomes

I have tried inputting into Ruby exactly what you suggested, but replaced the file name with that of my own...it doesnt work - does it matter where the inputCSV file is saved?

Any other tips much appreciated..sorry if these are ridiculous questions!

xx

ADD REPLYlink written 7.9 years ago by User 7433150

Hi Rob, I will add this solution to the original blog post.Thanks!

ADD REPLYlink written 7.9 years ago by hadasa1.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 652 users visited in the last hour