Hi I have a question related to species classification. I have fasta sequences file. In which i need to edit my fasta sequences headers. like if i have the fasta sequnces header
>ENSMICP00000000194_Mmur/1-1728
i need to edit this header with full species classification like:
>ENSMICP00000000194_Species Common Name_Species
Name_Genus_Family_Order_Class_Phylum_Kingdom_Continent
and the region (from where that species originated)
(The line above has been wrapped for readability)
Can anyone guide me to solve my problem, That how I can get all these informations once? How I can edit the headers of Fasta sequences file. I have well my main problem is to get continent and the region information. I am becoming unable to get any authentic information.
Thank you
thank you for your guide line
Thank you, vchris - I have edited the question.
BTW, replacing the header is a minor task here, I'm curious how one would get this annotation - it does seem like an odd set, looks like it will need more than GenBank parsing to get to some of the data points. What do you think?
Thanks a lot @Ram,
Below is my view of how to do it, let me know what you think
The OP should have a separate master file that should have arrays or field attributes having the required information and once a hit is seen the fields can be added as per the requirement , so for any ENSG id at header the pointer should query to the separate master file match the ID from there and retrieve the information from that row and replace it in the header file of the input fasta. This could be one way. (from my previous experience of Retail project in IT company during my IT development days ;) ). This can be a possible way to exploit. Any other ways you think @Ram?
I have a feeling that @student wants to look up that information and then add that to the header. If this is the case that should be clearly mentioned in the original question. More often than not we are left with trying to figure out what the OP wants in these queries. The region part would need some other queries (like wikipedia) since that information is not likely to be present in a taxonomic record.
Indeed the question need more detailing and clarity. I just responded as Ram was inquisitive about achieving the task. So I posted my thought of how I feel that can be achieved. I second your view @genomax2
Both of you are correct - genomax2 has the right idea on the subtext of my question, in that the data OP needs has to come from multiple sources - I am not convinced Wikipedia is a valid source unless the actual source is verified, in which case the master table you speak of has to created as a manually curated source.
This does not seem like a programming assignment - if anything, it looks like an elaborately designed and contrived BLAST assignment, or an incredibly blown up intermediate step to some kind of population genetics experiment.
thanks alot @genomax2 and you are right i should clear that i want looking up for the classification information, and then need to add it in the header. I am facing the problem related to region part information, because these informations are not present in the taxonomic part. thankyou