Question: Perl one liner to extract sequences by their ID from a FASTA file - help
gravatar for pendragon
22 months ago by
pendragon20 wrote:

Hi, I'm EXTREMELY new to perl and just about all serious bioinformatic work. I gratefully found some one-liner perl scripts from the Edwards Lab, one of which works and the other (and the one most useful to me) does not. This line (below) DOES work and extracts and prints particular sequences (id1 and id2 by name):

perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw(id1 id2)}print if $c' fasta.file

However I have a txt file with all the names of the sequence identifiers (a set of receptor gene sequences) and I'd like to extract them from a large fasta file (in this case the mouse refseq database). This is a another line (below) from the same lab that should be able to do this, but I've had no luck.

perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' ids.file fasta.file

Does anyone have any suggestions about what may be going wrong? Or other ways to efficiently do this as a complete beginner? I recognize this is quite a vague post, and there are related posts about performing similar tasks but I've been struggling to find efficient ways to do this and not get totally mired in possibilities.

extract novice perl fasta • 852 views
ADD COMMENTlink modified 21 months ago by Ram15k • written 22 months ago by pendragon20

Check similar posts: Retrieve a subset of FASTA from large Illumina multi-FASTA file

ADD REPLYlink written 22 months ago by shenwei3563.6k
gravatar for Asaf
21 months ago by
Asaf4.8k wrote:

It's a very easy task and a good practice. You'll spend much less time writing a real script which will be documented and version controlled than fixing a magic one liner which you'll then search in your emails to re-run. The script you'll write can be then used for other manipulations of fasta files and you'll already have the platform for implementing them quickly.

ADD COMMENTlink written 21 months ago by Asaf4.8k

Yeah I agree with this. I tend to see a lot of command-line warriors these days who spend 5 minutes perfecting a one-liner when they could have easily wrote the script in a minute. Work your data, not your tools.

ADD REPLYlink modified 21 months ago • written 21 months ago by Damian Kao14k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 913 users visited in the last hour