Question: cut a fasta file into two columns of pos and single base
0
gravatar for mittjohns
7 months ago by
mittjohns30
United States
mittjohns30 wrote:

What’s the most efficient (fast and simple code) way to convert/cut a fasta file (either part or the whole sequence) of a chromosome into two column tab-delimited format of pos and base? For example:

fasta file

>chr2
ATGCATTC...

converted pos-base file

1 A
2 T
3 G
4 C

I know we can write a script to do so. But this seems to be a task for a one-liner or some existing tools. Thanks!

position sequence base fasta • 260 views
ADD COMMENTlink modified 7 months ago by Pierre Lindenbaum124k • written 7 months ago by mittjohns30
2
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum124k wrote:
grep -v '^>' in.fasta | tr -d '\n' | grep -o .  | cat -n
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum124k

A multi-fasta file will be numbered consecutively, correct? OP should keep that in mind.

ADD REPLYlink written 7 months ago by genomax74k

thanks Pierre, a brilliant use of grep -o and cat -n.

ADD REPLYlink written 7 months ago by mittjohns30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 964 users visited in the last hour