attend the sequence length to the sequence name
2
0
Entering edit mode
7.4 years ago
Alex ▴ 50

Hi,everyone maybe it a boring problem ,but is very important to me ,my sequence is :

c1/f1p99 GGAGAGGATGGCTTTGGAGCTGGTGGTACCGGTGTGACAGGTGGAGGAGATGGCCTTGGCGCCGGCGCCACGGACGGGGATGG what I want to is attend the sequence length to the fasta name,such like this : c1/f1p99/56 GGAGAGGATGGCTTTGGAGCTGGTGGTACCGGTGTGACAGGTGGAGGAGATGGCCTTGGCGCCGGCGCCACGGACGGGGATGG Did anyone have a good ideas about it Thanks Jerry

RNA-Seq • 1.1k views
ADD COMMENT
1
Entering edit mode
7.4 years ago
venu 7.1k

Assuming you have a fasta file with single line sequence

cat file.fa | paste - - | awk '{print $1"/"length($2)"\n"$2}' > new_file.fa

BTW, delete your other question About sequence problem for help

Small suggestion: Don't post one thread multiple times, use proper tags. And check this

How to Use Biostars, Part-I: Questions, Answers, Comments and Replies

ADD COMMENT
0
Entering edit mode

Thanks ,It helps me much

ADD REPLY
1
Entering edit mode
7.4 years ago
iraun 6.2k

I guess that you mean to append the sequence length to the fasta header. This can be done in such a different ways. And I'm pretty sure that if you do a quick search in Google, you'd be able to find a solution yourself, which is always the best way of addressing a problem. After trying to solve the problem by yourself, you can always come here to ask if you are struggling and show your previous attempts. Here is one possible solution to your task:

awk '/^>/ {if (seqlen){print id"|"seqlen"\n"seq};id=$1;seq="";seqlen=0;next; } { seqlen += length($0);seq=seq""$0}END{print id"|"seqlen"\n"seq}' file.fa

Let me know if you need to understand the code.

Hope it helps ;)

ADD COMMENT
0
Entering edit mode

Thanks ,It very usefull

ADD REPLY

Login before adding your answer.

Traffic: 3022 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6