Question: fasta file to tab delimited file
1
gravatar for nakanomasayuki265
17 months ago by
nakanomasayuki26520 wrote:

I want to change the format of the fasta file.

>Name
AAAAAAAAAAAAAAAAAAAAAAAAA
>Fasta
BBBBBBBBBBBBBBBBBBBBBBBBBB
·
·
·

Fasta files are in a state with no line breaks except for> lines.

I would like to do this as tab delimited.

#Name AAAAAAAAAAAAAAAAAAAAAAAAA
#Fasta BBBBBBBBBBBBBBBBBBBBBBBBB
#·
#·
#·

What kind of commands and scripts are there? Could you please tell me?

sequence • 2.1k views
ADD COMMENTlink modified 2 days ago by SmallChess430 • written 17 months ago by nakanomasayuki26520

This sounds like an XY problem. Can you explain what you are trying to accomplish?

ADD REPLYlink written 17 months ago by Brian Bushnell15k
3
gravatar for Alex Reynolds
17 months ago by
Alex Reynolds24k
Seattle, WA USA
Alex Reynolds24k wrote:

Sure, just use awk:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' in.fa | tail -n+2 > out.txt
ADD COMMENTlink modified 17 months ago • written 17 months ago by Alex Reynolds24k
1

Alternative: awk 'BEGIN{RS=">";OFS="\t"}NR>1{print "#"$1,$2}' inFile > outFile

ADD REPLYlink modified 17 months ago • written 17 months ago by 5heikki7.4k

Hey, do you know how to change tab delimited back to fasta format?

ADD REPLYlink written 13 months ago by yangzituo0

like:

seq1  AAAATTTT
seq2 CCCCGGGG

convert it back to:

>seq1
AAAATTTT
>seq2
CCCCGGGG

Thanks~

ADD REPLYlink modified 13 months ago • written 13 months ago by yangzituo0
1

seqkit

seqkit tab2fx xxx.tab > xxx.fasta
ADD REPLYlink written 13 months ago by shenwei3564.0k
0
gravatar for SmallChess
2 days ago by
SmallChess430
Australia
SmallChess430 wrote:

Please use the seqkit tool. The accepted solution wouldn't work for multiple lines, so it should be ignored.

seqkit fx2tab myFASTA >  myTAB
ADD COMMENTlink modified 2 days ago by finswimmer3.6k • written 2 days ago by SmallChess430

will not work for multiple lines in FASTQ

  • FASTQ has only one sequence line (of significance at least)
  • OP asked FASTA to TSV, not FASTQ to TSV
ADD REPLYlink written 2 days ago by Ram15k

My sample command did indeed converted FASTA to TSV.

ADD REPLYlink written 2 days ago by SmallChess430

Yes, but the accepted answer does work on multiple lines, unless I'm missing something. RS=> should take care of not separating records by \n.

ADD REPLYlink written 2 days ago by Ram15k

The accepted answer had "tail -n+2 ", it wouldn't work for multiple lines.

ADD REPLYlink written 2 days ago by SmallChess430

How so? Can you explain please?

ADD REPLYlink written 2 days ago by Ram15k
$ cat test.fa 
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2
#Name   AAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBB

$ seqkit fx2tab test.fa
Name    AAAAAAAAAAAAAAAAA   
Fasta   BBBBBBBBBBBBBBBBBBBBBBBBBB

or a simple case:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2 
#Name   AAAAA
#Fasta  B

$ cat test.fa 
>Name
AAAAA A

>Fasta
B
BBBBBB
ADD REPLYlink modified 1 day ago • written 1 day ago by cpad01127.5k

This should work for multiline fasta:

$ awk -v RS=">" -v ORS="\n" -v OFS="" '{$1="#"$1"\t"}1' test.fa|tail -n+2
#Name   AAAAAAAAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBBBBBBBBBBBBBB

$ cat test.fa   
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB
ADD REPLYlink modified 1 day ago • written 1 day ago by finswimmer3.6k

@ SmallChess tail -n+2 removes unwanted first line. However as you mentioned, code doesn't work for multi line fasta or fasta with gaps in the sequence

ADD REPLYlink written 1 day ago by cpad01127.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1008 users visited in the last hour