Question: fasta file to tab delimited file
1
gravatar for nakanomasayuki265
3.1 years ago by
nakanomasayuki26570 wrote:

I want to change the format of the fasta file.

>Name
AAAAAAAAAAAAAAAAAAAAAAAAA
>Fasta
BBBBBBBBBBBBBBBBBBBBBBBBBB
·
·
·

Fasta files are in a state with no line breaks except for> lines.

I would like to do this as tab delimited.

#Name AAAAAAAAAAAAAAAAAAAAAAAAA
#Fasta BBBBBBBBBBBBBBBBBBBBBBBBB
#·
#·
#·

What kind of commands and scripts are there? Could you please tell me?

sequence • 5.8k views
ADD COMMENTlink modified 19 months ago by SmallChess500 • written 3.1 years ago by nakanomasayuki26570

This sounds like an XY problem. Can you explain what you are trying to accomplish?

ADD REPLYlink written 3.1 years ago by Brian Bushnell17k
3
gravatar for Alex Reynolds
3.1 years ago by
Alex Reynolds29k
Seattle, WA USA
Alex Reynolds29k wrote:

Sure, just use awk:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' in.fa | tail -n+2 > out.txt
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Alex Reynolds29k
1

Alternative: awk 'BEGIN{RS=">";OFS="\t"}NR>1{print "#"$1,$2}' inFile > outFile

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by 5heikki8.6k

Hey, do you know how to change tab delimited back to fasta format?

ADD REPLYlink written 2.7 years ago by yangzituo0

like:

seq1  AAAATTTT
seq2 CCCCGGGG

convert it back to:

>seq1
AAAATTTT
>seq2
CCCCGGGG

Thanks~

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by yangzituo0
1

seqkit

seqkit tab2fx xxx.tab > xxx.fasta
ADD REPLYlink written 2.7 years ago by shenwei3565.0k
0
gravatar for SmallChess
19 months ago by
SmallChess500
Australia
SmallChess500 wrote:

Please use the seqkit tool. The accepted solution wouldn't work for multiple lines, so it should be ignored.

seqkit fx2tab myFASTA >  myTAB
ADD COMMENTlink modified 19 months ago by finswimmer13k • written 19 months ago by SmallChess500

will not work for multiple lines in FASTQ

  • FASTQ has only one sequence line (of significance at least)
  • OP asked FASTA to TSV, not FASTQ to TSV
ADD REPLYlink written 19 months ago by RamRS25k

My sample command did indeed converted FASTA to TSV.

ADD REPLYlink written 19 months ago by SmallChess500

Yes, but the accepted answer does work on multiple lines, unless I'm missing something. RS=> should take care of not separating records by \n.

ADD REPLYlink written 19 months ago by RamRS25k

The accepted answer had "tail -n+2 ", it wouldn't work for multiple lines.

ADD REPLYlink written 19 months ago by SmallChess500

How so? Can you explain please?

ADD REPLYlink written 19 months ago by RamRS25k
$ cat test.fa 
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2
#Name   AAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBB

$ seqkit fx2tab test.fa
Name    AAAAAAAAAAAAAAAAA   
Fasta   BBBBBBBBBBBBBBBBBBBBBBBBBB

or a simple case:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2 
#Name   AAAAA
#Fasta  B

$ cat test.fa 
>Name
AAAAA A

>Fasta
B
BBBBBB
ADD REPLYlink modified 19 months ago • written 19 months ago by cpad011212k

This should work for multiline fasta:

$ awk -v RS=">" -v ORS="\n" -v OFS="" '{$1="#"$1"\t"}1' test.fa|tail -n+2
#Name   AAAAAAAAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBBBBBBBBBBBBBB

$ cat test.fa   
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB
ADD REPLYlink modified 19 months ago • written 19 months ago by finswimmer13k

Thank you ! This is great !!

ADD REPLYlink written 4 months ago by lagartija60

@ SmallChess tail -n+2 removes unwanted first line. However as you mentioned, code doesn't work for multi line fasta or fasta with gaps in the sequence

ADD REPLYlink written 19 months ago by cpad011212k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1485 users visited in the last hour