Reverse step | From abundance matrix into "original" fasta file
1
0
Entering edit mode
5.6 years ago
fibar ▴ 90

Are there available tools out there to go from an abundance matrix into a sort of original fasta file, conserving somehow the same information? The file looks like this:

sequence   sample1   sample2   sample3   ...
actgg...   43        89        23        ...
actga...   03        53        19        ...


I also have identifiers for each sequence. The output would look like:

>sample1_readIDx
actgg...
actgg...
...
actga...


The first sequence should appear 43 times with a sample1 header, 89 times with a sample9, and so on.

next-gen amplicon-sequencing data • 1.0k views
0
Entering edit mode
5.6 years ago

using awk:

 awk '/^sequence/ {split($0,header);next;} {for(i=2;i<=NF;++i) {N=int($i);for(x=0;x<N;++x) {printf(">%s_%d\n%s\n",header[i],NR,\$1);}}} ' input.txt

0
Entering edit mode

Thanks Pierre. It run. However, it didn't print the headers as I described it in my post. I only see an underscore followed by a number. Were you thinking of an additional step afterwards?

0
Entering edit mode

it didn't print the headers as I described it in my post

yes because I did not understand the nature of this header. Feel free to modify this simple awk script.