Question: cut a fasta file into two columns of pos and single base
0
gravatar for mittjohns
16 months ago by
mittjohns30
United States
mittjohns30 wrote:

What’s the most efficient (fast and simple code) way to convert/cut a fasta file (either part or the whole sequence) of a chromosome into two column tab-delimited format of pos and base? For example:

fasta file

>chr2
ATGCATTC...

converted pos-base file

1 A
2 T
3 G
4 C

I know we can write a script to do so. But this seems to be a task for a one-liner or some existing tools. Thanks!

position sequence base fasta • 395 views
ADD COMMENTlink modified 16 months ago by Pierre Lindenbaum129k • written 16 months ago by mittjohns30
2
gravatar for Pierre Lindenbaum
16 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:
grep -v '^>' in.fasta | tr -d '\n' | grep -o .  | cat -n
ADD COMMENTlink written 16 months ago by Pierre Lindenbaum129k

A multi-fasta file will be numbered consecutively, correct? OP should keep that in mind.

ADD REPLYlink written 16 months ago by genomax87k

thanks Pierre, a brilliant use of grep -o and cat -n.

ADD REPLYlink written 16 months ago by mittjohns30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1618 users visited in the last hour