Question

FASTA FILE to TABLE R/python

1

Entering edit mode

4.6 years ago

fagambaro3 ▴ 30

Hi all,

Does anyone knows who to convert a DNA sequence from a fasta file to a table of one column? Can be in R or python!

I tried this fasta to table converter but is not working for me.. https://rstudio-pubs-static.s3.amazonaws.com/518943_a6bb21f87f594e6fb2aaa9ca2ef79cc0.html

Then I also tried to convert my fasta file into a csv (using https://birdlet.github.io/2017/12/13/fasta2csv/ ) but is not working either becuse then I have multiples columns, not one as I need.

1 >DENV4_(consensus)
2 A G T T G T T A G T C T G T G T G G A C C G A C A A G G A C A G T T C C A A A 3 T T C T A A C A G T T T G T T T A G A T A G A G A G C A G A T C T C T G G A A

Can anyone help me?

Thanks a lot!

Fabiana

R fasta • 3.5k views

ADD COMMENT • link updated 4.5 years ago by gayachit ▴ 200 • written 4.6 years ago by fagambaro3 ▴ 30

2

Entering edit mode

If you linearize the fasta file then it should become what you are looking for. Try this code from @Pierre.

ADD REPLY • link 4.6 years ago by GenoMax 145k

0

Entering edit mode

`Hey! Thanks for the help!!

So, I first linearized my fasta as you suggested:

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < ipc214_S8_DENV4_consensus.fa

Then i converted my fasta into csv:

fasta2csv.py ipc214_S8_DENV4_linearized.fasta ipc214_S8_linearized.csv

And then in R i try to open my csv file:

read.csv(file = 'ipc214_S8_linearized.csv', header = FALSE, sep = ",", quote = "\"",
     dec = ".", fill = TRUE)

And I get the following:

1 >DENV4_(consensus) AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAATCGGAAGCTTGCTTAACACAGTTCTAACAGTTTGTTTAGATAGAGAGCAGATCTCTGGAAAAATGAACCAACGAAAGAAGGTGGCTAGACCACCTTTCAATATGCTGAAACGCGAGAGAAACCGCGTATCAACCCCTCAAGGGTTGGTGAAGAGATTCTCGACTGGACTTTTTTCCGGGAAAGGACCCTTACGGATGATGTTGGCATTCATTACGTTTTTGAGAGTTCTTTCCATCCCACCAACAGCAGGGATTCTAAAAAGATGGGGACAGTTAAAGAAAAACAAGGCCGTGAAG.. <truncated>

Which is not exactly what I need. I want to have a table like this:

1 A

2 G

3 T

4 T

etc..

Maybe my approach is not the best! What do you think?

Thanks a lot again!

ADD REPLY • link 4.5 years ago by fagambaro3 ▴ 30

1

Entering edit mode

Here are some other options to linearize fasta: Linearize fasta files

ADD REPLY • link 4.6 years ago by GenoMax 145k

score 1 · Answer 1 · 2020-02-17

You could try this simple code in Python 3

import csv
dna_seq="AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAA3TTCTAACAGTTTGTTTAGAT"

g = list(enumerate(dna_seq, 1))
with open("letter.csv", "w") as f:
    writer = csv.writer(f)
    for row in g:
        writer.writerow(row)
f.close()

This will generate a letter.csv file