Hi,
I have a file in the following format:
1 tag 0.108 1 11 B7LTF9 P05100
2 alkA 0.046 2 11 B7LV32 P04395
3 gnd 0.011 2 11 B7LUG0 P00350
4 pgl 0.048 1 11 B7LJZ2 P52697
5 aaeA 0.061 3 11 B7LRL6 P46482
6 aaeB 0.069 3 11 B7LRL5 P46481
...
The last 2 colums are the Uniprot Ids from different species (Escherichia fergusonii ATCC 35469 and Escherichia coli K-12 respectively). Using those Uniprot IDs, I need the nucleotide CDS. I have code to parse the file and get the uniprot ids of each species in individual files. However, I cant figure out how to get the CDS. I have tried Biomart to retrieve the seqs from EMBL bacteria, however, they do not have complete mapping of Uniprot Ids to EMBL bacteria IDs. Please suggest any other way I can accomplish this.
Thank you very much.
Thank you for your answer. It is very helpful. I tried the uniprot ID mapping before asking this question and I am getting multiple IDs for 1 uniprot ID just like you are. How do i know which one is the correct one?
It's not really a case of which is "correct". Any of the EMBL sequences could be relevant. You would have to do some further analysis of the returned sequences, e.g. how similar are they?