Question: How To Grep Largest Contig From A Multi Fasta File
0
gravatar for HG
5.3 years ago by
HG1.1k
Germany
HG1.1k wrote:

Hi everybody, can any one tell me please how to extract a largest contig from a multi-fasta file ?? using awk or grep ??

awk • 7.0k views
ADD COMMENTlink modified 4.8 years ago by Prakki Rama2.2k • written 5.3 years ago by HG1.1k
5
gravatar for Istvan Albert
5.3 years ago by
Istvan Albert ♦♦ 78k
University Park, USA
Istvan Albert ♦♦ 78k wrote:

you can use Heng Li's bioawk and samtools:

Then the commands would be

# sort the sequences by length
$ cat w.fasta | bioawk -c fastx '{ print length($seq), $name }' | sort -k1,1rn | head -1
989    HR5V3UP02C00KT

# extract the sequence from the file 
$ samtools faidx w.fasta HR5V3UP02C00KT
>HR5V3UP02C00KT
TCGTACTCGTACGTAGAGGTTCGATCCTAGGGTCCTACGACGGAAGTAAAAACGGCCGGT
CCGGGCCCCGGTTCGACGTCGGACCGTAACCAACGAAAATTGGCCGGTAAAGGGGGTTCC
...
ADD COMMENTlink modified 5.3 years ago • written 5.3 years ago by Istvan Albert ♦♦ 78k

Than you so much but could you please check the error

$ cat list4a.fasta | awk -c fastx '{ print length($seq), $name }'| sort -k1, 1rn | head -1
sort: invalid number after `,': invalid count at start of `'
ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by HG1.1k
2

don't type in the whole thing at first, build it one step at a time, and pipe it through a pager, that way you will notice potential errors

ADD REPLYlink written 5.3 years ago by Istvan Albert ♦♦ 78k

Dear Istvan, I am doing according to you suggestion: cat list4a.fasta working fine awk -c fastx '{ print length($seq), $name }' Here is the problem Usage: awk [POSIX or GNU style options] -f progfile [--] file ... Usage: awk [POSIX or GNU style options] [--] 'program' file ... could you please tell me waht is the error ??

ADD REPLYlink written 5.3 years ago by HG1.1k
1

did you install bioawk? that's is the version of awk you will need to use.

ADD REPLYlink written 5.3 years ago by Istvan Albert ♦♦ 78k

Hello Istvan, I have downloaded bioawk but when i am trying to install its showing hiren@FB11-10207:~/Desktop/spades/bioawk-master $ make make: `awk' is up to date. But once i am checking bioawk like : hiren@FB11-10207:~/Desktop/spades/bioawk-master $ bioawk bioawk: command not found

Could you please let me know any solution ????

ADD REPLYlink written 5.3 years ago by HG1.1k
1

there is no space after '-k1,'

ADD REPLYlink written 5.3 years ago by Pierre Lindenbaum115k
0
gravatar for Prakki Rama
4.8 years ago by
Prakki Rama2.2k
Singapore
Prakki Rama2.2k wrote:

cat seqs_oneline.fasta | perl -e 'while (<>) {$h=$_; $s=<>; $seqs{$h}=$s;} foreach $header (reverse sort {length($seqs{$a}) <=> length($seqs{$b})} keys %seqs) {print $header.$seqs{$header}}' | head -2

source: sort fasta

ADD COMMENTlink written 4.8 years ago by Prakki Rama2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1075 users visited in the last hour