fasta file to tab delimited file
2
2
Entering edit mode
5.7 years ago

I want to change the format of the fasta file.

>Name
AAAAAAAAAAAAAAAAAAAAAAAAA
>Fasta
BBBBBBBBBBBBBBBBBBBBBBBBBB
·
·
·

Fasta files are in a state with no line breaks except for> lines.

I would like to do this as tab delimited.

#Name AAAAAAAAAAAAAAAAAAAAAAAAA
#Fasta BBBBBBBBBBBBBBBBBBBBBBBBB
#·
#·
#·

What kind of commands and scripts are there? Could you please tell me?

sequence • 12k views
ADD COMMENT
0
Entering edit mode

This sounds like an XY problem. Can you explain what you are trying to accomplish?

ADD REPLY
4
Entering edit mode
5.7 years ago

Sure, just use awk:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' in.fa | tail -n+2 > out.txt
ADD COMMENT
1
Entering edit mode

Alternative: awk 'BEGIN{RS=">";OFS="\t"}NR>1{print "#"$1,$2}' inFile > outFile

ADD REPLY
0
Entering edit mode

Hey, do you know how to change tab delimited back to fasta format?

ADD REPLY
0
Entering edit mode

like:

seq1  AAAATTTT
seq2 CCCCGGGG

convert it back to:

>seq1
AAAATTTT
>seq2
CCCCGGGG

Thanks~

ADD REPLY
1
Entering edit mode

seqkit

seqkit tab2fx xxx.tab > xxx.fasta
ADD REPLY
4
Entering edit mode
4.2 years ago
SmallChess ▴ 600

Please use the seqkit tool. The accepted solution wouldn't work for multiple lines, so it should be ignored.

seqkit fx2tab myFASTA >  myTAB
ADD COMMENT
0
Entering edit mode

will not work for multiple lines in FASTQ

  • FASTQ has only one sequence line (of significance at least)
  • OP asked FASTA to TSV, not FASTQ to TSV
ADD REPLY
0
Entering edit mode

My sample command did indeed converted FASTA to TSV.

ADD REPLY
0
Entering edit mode

Yes, but the accepted answer does work on multiple lines, unless I'm missing something. RS=> should take care of not separating records by \n.

ADD REPLY
0
Entering edit mode

The accepted answer had "tail -n+2 ", it wouldn't work for multiple lines.

ADD REPLY
0
Entering edit mode

How so? Can you explain please?

ADD REPLY
0
Entering edit mode
$ cat test.fa 
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2
#Name   AAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBB

$ seqkit fx2tab test.fa
Name    AAAAAAAAAAAAAAAAA   
Fasta   BBBBBBBBBBBBBBBBBBBBBBBBBB

or a simple case:

$ awk 'BEGIN{RS=">"}{print "#"$1"\t"$2;}' test.fa | tail -n+2 
#Name   AAAAA
#Fasta  B

$ cat test.fa 
>Name
AAAAA A

>Fasta
B
BBBBBB
ADD REPLY
0
Entering edit mode

This should work for multiline fasta:

$ awk -v RS=">" -v ORS="\n" -v OFS="" '{$1="#"$1"\t"}1' test.fa|tail -n+2
#Name   AAAAAAAAAAAAAAAAA
#Fasta  BBBBBBBBBBBBBBBBBBBBBBBBBB

$ cat test.fa   
>Name
AAAAAAAAAAA
AAAAAA

>Fasta
BBBBBBBBBBBBBB
BBBBB
B
BBBBBB
ADD REPLY
0
Entering edit mode

Thank you ! This is great !!

ADD REPLY
0
Entering edit mode

@ SmallChess tail -n+2 removes unwanted first line. However as you mentioned, code doesn't work for multi line fasta or fasta with gaps in the sequence

ADD REPLY

Login before adding your answer.

Traffic: 2166 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6