Question: FASTA FILE to TABLE R/python
0
gravatar for fagambaro3
4 days ago by
fagambaro30
fagambaro30 wrote:

Hi all,

Does anyone knows who to convert a DNA sequence from a fasta file to a table of one column? Can be in R or python!

I tried this fasta to table converter but is not working for me.. https://rstudio-pubs-static.s3.amazonaws.com/518943_a6bb21f87f594e6fb2aaa9ca2ef79cc0.html

Then I also tried to convert my fasta file into a csv (using https://birdlet.github.io/2017/12/13/fasta2csv/ ) but is not working either becuse then I have multiples columns, not one as I need.

1 >DENV4_(consensus)
2 A G T T G T T A G T C T G T G T G G A C C G A C A A G G A C A G T T C C A A A 3 T T C T A A C A G T T T G T T T A G A T A G A G A G C A G A T C T C T G G A A

Can anyone help me?

Thanks a lot!

Fabiana

R fasta • 110 views
ADD COMMENTlink modified 23 hours ago by gayachit10 • written 4 days ago by fagambaro30
1

If you linearize the fasta file then it should become what you are looking for. Try this code from @Pierre.

ADD REPLYlink written 4 days ago by genomax78k

`Hey! Thanks for the help!!

So, I first linearized my fasta as you suggested:

awk '/^>/ {printf("%s%s\t",(N>0?"\n":""),$0);N++;next;} {printf("%s",$0);} END {printf("\n");}'  < ipc214_S8_DENV4_consensus.fa

Then i converted my fasta into csv:

fasta2csv.py ipc214_S8_DENV4_linearized.fasta ipc214_S8_linearized.csv

And then in R i try to open my csv file:

read.csv(file = 'ipc214_S8_linearized.csv', header = FALSE, sep = ",", quote = "\"",
     dec = ".", fill = TRUE)

And I get the following:

1 >DENV4_(consensus) AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAATCGGAAGCTTGCTTAACACAGTTCTAACAGTTTGTTTAGATAGAGAGCAGATCTCTGGAAAAATGAACCAACGAAAGAAGGTGGCTAGACCACCTTTCAATATGCTGAAACGCGAGAGAAACCGCGTATCAACCCCTCAAGGGTTGGTGAAGAGATTCTCGACTGGACTTTTTTCCGGGAAAGGACCCTTACGGATGATGTTGGCATTCATTACGTTTTTGAGAGTTCTTTCCATCCCACCAACAGCAGGGATTCTAAAAAGATGGGGACAGTTAAAGAAAAACAAGGCCGTGAAG.. <truncated>

Which is not exactly what I need. I want to have a table like this:

1 A

2 G

3 T

4 T

etc..

Maybe my approach is not the best! What do you think?

Thanks a lot again!

ADD REPLYlink modified 3 days ago • written 3 days ago by fagambaro30
1

Here are some other options to linearize fasta: Linearize fasta files

ADD REPLYlink written 4 days ago by genomax78k
0
gravatar for gayachit
23 hours ago by
gayachit10
India
gayachit10 wrote:

You could try this simple code in Python 3

import csv
dna_seq="AGTTGTTAGTCTGTGTGGACCGACAAGGACAGTTCCAAA3TTCTAACAGTTTGTTTAGAT"

g = list(enumerate(dna_seq, 1))
with open("letter.csv", "w") as f:
    writer = csv.writer(f)
    for row in g:
        writer.writerow(row)
f.close()

This will generate a letter.csv file

ADD COMMENTlink modified 23 hours ago • written 23 hours ago by gayachit10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1157 users visited in the last hour