Question

Reverse step | From abundance matrix into "original" fasta file

0

Entering edit mode

8.2 years ago

fibar ▴ 90

Are there available tools out there to go from an abundance matrix into a sort of original fasta file, conserving somehow the same information? The file looks like this:

sequence   sample1   sample2   sample3   ...
actgg...   43        89        23        ...
actga...   03        53        19        ...

I also have identifiers for each sequence. The output would look like:

>sample1_readIDx
actgg...
>sample1_readIDx
actgg...
...
>sample1_readIDy
actga...

The first sequence should appear 43 times with a sample1 header, 89 times with a sample9, and so on.

next-gen amplicon-sequencing data • 1.4k views

ADD COMMENT • link updated 8.2 years ago by Pierre Lindenbaum 166k • written 8.2 years ago by fibar ▴ 90

score 0 · Answer 1 · 2017-04-24

0

Entering edit mode

8.2 years ago

Pierre Lindenbaum 166k

using awk:

 awk '/^sequence/ {split($0,header);next;} {for(i=2;i<=NF;++i) {N=int($i);for(x=0;x<N;++x) {printf(">%s_%d\n%s\n",header[i],NR,$1);}}} ' input.txt

ADD COMMENT • link 8.2 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

Thanks Pierre. It run. However, it didn't print the headers as I described it in my post. I only see an underscore followed by a number. Were you thinking of an additional step afterwards?

ADD REPLY • link 8.2 years ago by fibar ▴ 90

0

Entering edit mode

it didn't print the headers as I described it in my post

yes because I did not understand the nature of this header. Feel free to modify this simple awk script.

ADD REPLY • link 8.2 years ago by Pierre Lindenbaum 166k