Question: Related to species classification information retrieval
gravatar for student
3.9 years ago by
student0 wrote:

Hi I have a question related to species classification. I have fasta sequences file. In which i need to edit my fasta sequences headers. like if i have the fasta sequnces header


i need to edit this header with full species classification like:

>ENSMICP00000000194_Species Common Name_Species 
and the region (from where that species originated)

(The line above has been wrapped for readability)

Can anyone guide me to solve my problem, That how I can get all these informations once? How I can edit the headers of Fasta sequences file. I have well my main problem is to get continent and the region information. I am becoming unable to get any authentic information.

Thank you

sequence R • 985 views
ADD COMMENTlink modified 3.9 years ago by RamRS27k • written 3.9 years ago by student0
gravatar for ivivek_ngs
3.9 years ago by
Seattle,WA, USA
ivivek_ngs4.9k wrote:

No need to write another answer to just ask for help. This seems to be an assignment question so you should do it on your own. For pointers find the links that does similar kind of tasks

Replace fasta headers with another name in a text file

Renaming Entries In A Fasta File

You should be able to figure out from the above links

P.S: Moderators please change the tag.

ADD COMMENTlink modified 3.9 years ago • written 3.9 years ago by ivivek_ngs4.9k

thank you for your guide line

ADD REPLYlink written 3.9 years ago by student0

Thank you, vchris - I have edited the question.

BTW, replacing the header is a minor task here, I'm curious how one would get this annotation - it does seem like an odd set, looks like it will need more than GenBank parsing to get to some of the data points. What do you think?

ADD REPLYlink written 3.9 years ago by RamRS27k

Thanks a lot @Ram,

Below is my view of how to do it, let me know what you think

The OP should have a separate master file that should have arrays or field attributes having the required information and once a hit is seen the fields can be added as per the requirement , so for any ENSG id at header the pointer should query to the separate master file match the ID from there and retrieve the information from that row and replace it in the header file of the input fasta. This could be one way. (from my previous experience of Retail project in IT company during my IT development days ;) ). This can be a possible way to exploit. Any other ways you think @Ram?

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by ivivek_ngs4.9k

I have a feeling that @student wants to look up that information and then add that to the header. If this is the case that should be clearly mentioned in the original question. More often than not we are left with trying to figure out what the OP wants in these queries. The region part would need some other queries (like wikipedia) since that information is not likely to be present in a taxonomic record.

ADD REPLYlink written 3.9 years ago by genomax83k

Indeed the question need more detailing and clarity. I just responded as Ram was inquisitive about achieving the task. So I posted my thought of how I feel that can be achieved. I second your view @genomax2

ADD REPLYlink written 3.9 years ago by ivivek_ngs4.9k

Both of you are correct - genomax2 has the right idea on the subtext of my question, in that the data OP needs has to come from multiple sources - I am not convinced Wikipedia is a valid source unless the actual source is verified, in which case the master table you speak of has to created as a manually curated source.

This does not seem like a programming assignment - if anything, it looks like an elaborately designed and contrived BLAST assignment, or an incredibly blown up intermediate step to some kind of population genetics experiment.

ADD REPLYlink written 3.9 years ago by RamRS27k

thanks alot @genomax2 and you are right i should clear that i want looking up for the classification information, and then need to add it in the header. I am facing the problem related to region part information, because these informations are not present in the taxonomic part. thankyou

ADD REPLYlink modified 3.9 years ago • written 3.9 years ago by student0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1745 users visited in the last hour