Question: cut a fasta file into two columns of pos and single base
0
gravatar for mittjohns
21 months ago by
mittjohns30
United States
mittjohns30 wrote:

What’s the most efficient (fast and simple code) way to convert/cut a fasta file (either part or the whole sequence) of a chromosome into two column tab-delimited format of pos and base? For example:

fasta file

>chr2
ATGCATTC...

converted pos-base file

1 A
2 T
3 G
4 C

I know we can write a script to do so. But this seems to be a task for a one-liner or some existing tools. Thanks!

position sequence base fasta • 462 views
ADD COMMENTlink modified 21 months ago by Pierre Lindenbaum133k • written 21 months ago by mittjohns30
2
gravatar for Pierre Lindenbaum
21 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:
grep -v '^>' in.fasta | tr -d '\n' | grep -o .  | cat -n
ADD COMMENTlink written 21 months ago by Pierre Lindenbaum133k

A multi-fasta file will be numbered consecutively, correct? OP should keep that in mind.

ADD REPLYlink written 21 months ago by GenoMax94k

thanks Pierre, a brilliant use of grep -o and cat -n.

ADD REPLYlink written 21 months ago by mittjohns30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1448 users visited in the last hour