How To Grep Largest Contig From A Multi Fasta File
2
0
Entering edit mode
7.7 years ago
HG ★ 1.1k

Hi everybody, can any one tell me please how to extract a largest contig from a multi-fasta file ?? using awk or grep ??

awk • 9.3k views
ADD COMMENT
6
Entering edit mode
7.7 years ago

You can use Heng Li's bioawk and samtools:

Then the commands would be

# sort the sequences by length
$ cat w.fasta | bioawk -c fastx '{ print length($seq), $name }' | sort -k1,1rn | head -1
989    HR5V3UP02C00KT

# extract the sequence from the file 
$ samtools faidx w.fasta HR5V3UP02C00KT
>HR5V3UP02C00KT
TCGTACTCGTACGTAGAGGTTCGATCCTAGGGTCCTACGACGGAAGTAAAAACGGCCGGT
CCGGGCCCCGGTTCGACGTCGGACCGTAACCAACGAAAATTGGCCGGTAAAGGGGGTTCC
...
ADD COMMENT
0
Entering edit mode

Than you so much but could you please check the error

$ cat list4a.fasta | awk -c fastx '{ print length($seq), $name }'| sort -k1, 1rn | head -1
sort: invalid number after `,': invalid count at start of `'
ADD REPLY
2
Entering edit mode

don't type in the whole thing at first, build it one step at a time, and pipe it through a pager, that way you will notice potential errors

ADD REPLY
0
Entering edit mode

Dear Istvan, I am doing according to you suggestion:

cat list4a.fasta           #working fine 
awk -c fastx '{ print length($seq), $name }'

Here is the problem

Usage: awk [POSIX or GNU style options] -f progfile [--] file ...
Usage: awk [POSIX or GNU style options] [--] 'program' file ...

could you please tell me waht is the error ??

ADD REPLY
1
Entering edit mode

did you install bioawk? that's is the version of awk you will need to use.

ADD REPLY
0
Entering edit mode

Hello Istvan, I have downloaded bioawk but when i am trying to install its showing

hiren@FB11-10207:~/Desktop/spades/bioawk-master $ make 
make: `awk' is up to date.

But once i am checking bioawk like

hiren@FB11-10207:~/Desktop/spades/bioawk-master $ bioawk
bioawk: command not found

Could you please let me know any solution ????

ADD REPLY
1
Entering edit mode

There is no space after -k1,

ADD REPLY
1
Entering edit mode
7.1 years ago
Prakki Rama ★ 2.4k
cat seqs_oneline.fasta | perl -e 'while (<>) {$h=$_; $s=<>; $seqs{$h}=$s;} foreach $header (reverse sort {length($seqs{$a}) <=> length($seqs{$b})} keys %seqs) {print $header.$seqs{$header}}' | head -2

Source: sort fasta

ADD COMMENT

Login before adding your answer.

Traffic: 2566 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6