Question

Related to species classification information retrieval

0

Entering edit mode

7.8 years ago

student • 0

Hi I have a question related to species classification. I have fasta sequences file. In which i need to edit my fasta sequences headers. like if i have the fasta sequnces header

>ENSMICP00000000194_Mmur/1-1728

i need to edit this header with full species classification like:

>ENSMICP00000000194_Species Common Name_Species 
Name_Genus_Family_Order_Class_Phylum_Kingdom_Continent 
and the region (from where that species originated)

(The line above has been wrapped for readability)

Can anyone guide me to solve my problem, That how I can get all these informations once? How I can edit the headers of Fasta sequences file. I have well my main problem is to get continent and the region information. I am becoming unable to get any authentic information.

Thank you

sequence R • 1.4k views

ADD COMMENT • link updated 7.8 years ago by Ram 43k • written 7.8 years ago by student • 0

score 1 · Answer 1 · 2016-06-22

1

Entering edit mode

7.8 years ago

ivivek_ngs ★ 5.2k

No need to write another answer to just ask for help. This seems to be an assignment question so you should do it on your own. For pointers find the links that does similar kind of tasks

Replace fasta headers with another name in a text file

Renaming Entries In A Fasta File

You should be able to figure out from the above links

P.S: Moderators please change the tag.

ADD COMMENT • link 7.8 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

thank you for your guide line

ADD REPLY • link 7.8 years ago by student • 0

0

Entering edit mode

Thank you, vchris - I have edited the question.

BTW, replacing the header is a minor task here, I'm curious how one would get this annotation - it does seem like an odd set, looks like it will need more than GenBank parsing to get to some of the data points. What do you think?

ADD REPLY • link 7.8 years ago by Ram 43k

0

Entering edit mode

Thanks a lot @Ram,

Below is my view of how to do it, let me know what you think

The OP should have a separate master file that should have arrays or field attributes having the required information and once a hit is seen the fields can be added as per the requirement , so for any ENSG id at header the pointer should query to the separate master file match the ID from there and retrieve the information from that row and replace it in the header file of the input fasta. This could be one way. (from my previous experience of Retail project in IT company during my IT development days ;) ). This can be a possible way to exploit. Any other ways you think @Ram?

ADD REPLY • link 7.8 years ago by ivivek_ngs ★ 5.2k

1

Entering edit mode

I have a feeling that @student wants to look up that information and then add that to the header. If this is the case that should be clearly mentioned in the original question. More often than not we are left with trying to figure out what the OP wants in these queries. The region part would need some other queries (like wikipedia) since that information is not likely to be present in a taxonomic record.

ADD REPLY • link 7.8 years ago by GenoMax 141k

0

Entering edit mode

Indeed the question need more detailing and clarity. I just responded as Ram was inquisitive about achieving the task. So I posted my thought of how I feel that can be achieved. I second your view @genomax2

ADD REPLY • link 7.8 years ago by ivivek_ngs ★ 5.2k

0

Entering edit mode

Both of you are correct - genomax2 has the right idea on the subtext of my question, in that the data OP needs has to come from multiple sources - I am not convinced Wikipedia is a valid source unless the actual source is verified, in which case the master table you speak of has to created as a manually curated source.

This does not seem like a programming assignment - if anything, it looks like an elaborately designed and contrived BLAST assignment, or an incredibly blown up intermediate step to some kind of population genetics experiment.

ADD REPLY • link 7.8 years ago by Ram 43k

0

Entering edit mode

thanks alot @genomax2 and you are right i should clear that i want looking up for the classification information, and then need to add it in the header. I am facing the problem related to region part information, because these informations are not present in the taxonomic part. thankyou

ADD REPLY • link 7.8 years ago by student • 0