Question

Sequence extraction

0

Entering edit mode

2.3 years ago

zhichusun ▴ 10

Hello, I have a fasta file that contains sequences of different lengths. I want to extract the base sequences greater than 500 and less than 10000bp and regenerate a fasta file. What should I do? Thanks a lot if anyone can help.

extraction Sequence • 1.0k views

ADD COMMENT • link updated 2.3 years ago by GenoMax 142k • written 2.3 years ago by zhichusun ▴ 10

score 3 · Answer 1 · 2022-01-10

3

Entering edit mode

2.3 years ago

Mensur Dlakic ★ 27k

One of the ways to do it is with seqkit:

seqkit seq -M 10000 -m 500 file.fas > new_file.fas

ADD COMMENT • link 2.3 years ago by Mensur Dlakic ★ 27k

0

Entering edit mode

wa, that's a good way.

ADD REPLY • link 2.3 years ago by zhichusun ▴ 10

score 1 · Answer 2 · 2022-01-11

1

Entering edit mode

2.3 years ago

cpad0112 21k

$ bioawk -c fastx '{ml=500;ML=10000;print (length($seq)>ml && length($seq)<ML)? (">"$name"\n"$seq) :""}' test.fna
$ cutadapt --quiet -m 500 -M 10000 test.fna

ADD COMMENT • link 2.3 years ago by cpad0112 21k

score 0 · Answer 3 · 2022-01-11

0

Entering edit mode

2.3 years ago

GenoMax 142k

Using BBMap suite:

reformat.sh in=input.fa out=filterd.fa minlength=500 maxlength=10000

ADD COMMENT • link 2.3 years ago by GenoMax 142k