Sequence filtration based on length
1
0
Entering edit mode
8.0 years ago
Sathish ▴ 60

Hello Everyone,

I want to filter sequence based on length between 15 - 30 nt from large file of RNA sequences (fasta format).

How to do this?

I appreciate any help.
Thanks.

RNA-Seq sequence perl grep Forum • 1.6k views
ADD COMMENT
0
Entering edit mode
8.0 years ago

Assuming that your sequences are on one line:

‚Äč$ awk '{ \
    if ($0 ~ /^>/) { \
        header = $0; \
    } \
    else { \
        l = length($0); \
        if ((l >= 15) && (l <= 30)) { \
            printf("%s\n%s\n", header, $0); \
        } \
    } \
}' foo.fa

Look at this Biostars question if you need to preprocess your FASTA file to put its sequences on one line.

ADD COMMENT
0
Entering edit mode

Thanks for your response. How to get those sequences along their corresponding header?

Input file in .fasta format containing many sequences with different headers.

ADD REPLY
0
Entering edit mode

I added some changes. Hope it helps.

ADD REPLY
0
Entering edit mode

Its working. Thanks a lot.

ADD REPLY

Login before adding your answer.

Traffic: 1670 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6