Question: How To Extract Only Geneid From Fasta Header?
0
gravatar for Abdul Rawoof
5.8 years ago by
Abdul Rawoof60
United States
Abdul Rawoof60 wrote:

In a multifasta file the fasta header having full details as follows:

">ENSMUSG0000005892|ENSMUST00000004524351|xclkvsldjldjkfklasdfjalsjk

">ENSMUSG0000001537|ENSMUST00000017451|dfasfasdfghfhgjhktytg

">ENSMUSG00000002234237|ENSMUST000000097869|pasdfasdfsadf

I want to extract only GeneID from above like

">ENSMUSG0000005892

">ENSMUSG0000001537

">ENSMUSG00000002234237

How can I extract only GeneID using perl program..??

Thanks.....

fasta perl extraction sequence • 2.2k views
ADD COMMENTlink modified 5.8 years ago by Kenosis1.2k • written 5.8 years ago by Abdul Rawoof60
1

did you try to search this site before asking your question ?

ADD REPLYlink written 5.8 years ago by Pierre Lindenbaum119k

This is a first semester student's question. Try something like split or a regex or bash or whatever, but I recommend trying to come up with an idea yourself before asking. Otherwise you'll never learn anything..

ADD REPLYlink written 5.8 years ago by SimonD0
1
gravatar for Kenosis
5.8 years ago by
Kenosis1.2k
Kenosis1.2k wrote:

Here are two options. As a script:

use strict;
use warnings;

while (<>) {
    print "$1\n" if /(>.+?)\|/;
}

Usage: perl script.pl inFile [>outFile]

The last, optional parameter directs output to a file.

As a one liner:

perl -lne 'print $1 if /(>.+?)\|/' inFile [>outFile]

Output from both on your dataset:

>ENSMUSG0000005892
>ENSMUSG0000001537
>ENSMUSG00000002234237

In both cases, the regex captures all the characters starting with ">" up to the first "|", and then the results are printed.

Hope this helps!

ADD COMMENTlink modified 5.8 years ago • written 5.8 years ago by Kenosis1.2k
0
gravatar for always_learning
5.8 years ago by
Doha, Qatar
always_learning960 wrote:
while(<stdin>){
if ($_ =~/>/){
@arr=split ($_, "I")
print $arr[0]
}
}

I left on you to make this code run-able!! :):)

this will work on unix also

grep ">" file.txt | cut -d "|" -f 1

But always try to learn Friend !! :)

ADD COMMENTlink modified 5.8 years ago by Matt Shirley8.9k • written 5.8 years ago by always_learning960

I like the Unix solution but can't see why in the Perl code, you split on upper-case "I".

ADD REPLYlink written 5.8 years ago by Neilfws48k

Hence I mentioned "I left on you to make this code run-able!! :):)" In fact I don't want to give to complete solution here in this case !! so it was | not Upper case "I"

ADD REPLYlink written 5.8 years ago by always_learning960
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 674 users visited in the last hour