Question: FASTA FILE to TABLE R/python
0
gravatar for fagambaro3
9 months ago by
fagambaro310
fagambaro310 wrote:

Hi all,

Does anyone knows who to convert a DNA sequence from a fasta file to a table of one column? Can be in R or python!

I tried this fasta to table converter but is not working for me.. https://rstudio-pubs-static.s3.amazonaws.com/518943_a6bb21f87f594e6fb2aaa9ca2ef79cc0.html

Then I also tried to convert my fasta file into a csv (using https://birdlet.github.io/2017/12/13/fasta2csv/ ) but is not working either becuse then I have multiples columns, not one as I need.

1 >DENV4_(consensus)
2 A G T T G T T A G T C T G T G T G G A C C G A C A A G G A C A G T T C C A A A 3 T T C T A A C A G T T T G T T T A G A T A G A G A G C A G A T C T C T G G A A

Can anyone help me?

Thanks a lot!

Fabiana

R fasta • 764 views
ADD COMMENTlink modified 9 months ago by gayachit200 • written 9 months ago by fagambaro310
1

If you linearize the fasta file then it should become what you are looking for. Try this code from @Pierre.

ADD REPLYlink written 9 months ago by genomax92k

`Hey! Thanks for the help!!

So, I first linearized my fasta as you suggested:

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < ipc214_S8_DENV4_consensus.fa

Then i converted my fasta into csv:

fasta2csv.py ipc214_S8_DENV4_linearized.fasta ipc214_S8_linearized.csv

And then in R i try to open my csv file:

read.csv(file = 'ipc214_S8_linearized.csv', header = FALSE, sep = ",", quote = "\"",
     dec = ".", fill = TRUE)

And I get the following:

1 >DENV4_(consensus) AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAATCGGAAGCTTGCTTAACACAGTTCTAACAGTTTGTTTAGATAGAGAGCAGATCTCTGGAAAAATGAACCAACGAAAGAAGGTGGCTAGACCACCTTTCAATATGCTGAAACGCGAGAGAAACCGCGTATCAACCCCTCAAGGGTTGGTGAAGAGATTCTCGACTGGACTTTTTTCCGGGAAAGGACCCTTACGGATGATGTTGGCATTCATTACGTTTTTGAGAGTTCTTTCCATCCCACCAACAGCAGGGATTCTAAAAAGATGGGGACAGTTAAAGAAAAACAAGGCCGTGAAG.. <truncated>

Which is not exactly what I need. I want to have a table like this:

1 A

2 G

3 T

4 T

etc..

Maybe my approach is not the best! What do you think?

Thanks a lot again!

ADD REPLYlink modified 9 months ago • written 9 months ago by fagambaro310
1

Here are some other options to linearize fasta: Linearize fasta files

ADD REPLYlink written 9 months ago by genomax92k
1
gravatar for gayachit
9 months ago by
gayachit200
India
gayachit200 wrote:

You could try this simple code in Python 3

import csv
dna_seq="AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAA3TTCTAACAGTTTGTTTAGAT"

g = list(enumerate(dna_seq, 1))
with open("letter.csv", "w") as f:
    writer = csv.writer(f)
    for row in g:
        writer.writerow(row)
f.close()

This will generate a letter.csv file

ADD COMMENTlink modified 9 months ago • written 9 months ago by gayachit200
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 990 users visited in the last hour