extracting contigs
1
0
Entering edit mode
18 months ago
hollyannj7 • 0

Hiya,

I have ordered my contigs by sequence length and wish to extract all of the contigs up to 310 (L50) or >000310F into a separate file.

Can this be done with grep?

Thanks!

fasta grep • 685 views
ADD COMMENT
0
Entering edit mode

I'm not sure I understand whether you want the shorter or the longer than L50 contigs.

Anyhow, provided your contigs are ordered by contig length as you wrote, you can simply get the last line of the contig at L50 and head/tail the file, for example

head -n $LINE_L50 <contig_all.fna >contig_above_L50.fna
ADD REPLY
0
Entering edit mode

Hi! Give a try to reformat.sh using maxlenght option (see BBMap reformat from BBMap toolkit).

ADD REPLY
3
Entering edit mode
18 months ago
Mark ★ 1.5k

The amazing seqkit can do this very easily:

seqkit seq seqs.fasta -m 310 > seqs.310.fasta

You can install seqkit using conda:

conda install -c bioconda seqkit

The docs are very good: https://bioinf.shenwei.me/seqkit/usage/#seq

ADD COMMENT

Login before adding your answer.

Traffic: 1620 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6