Question: How can I add the specie associated name to ensembls ID in a fasta file (cds) ?
0
gravatar for dtejadamartinez
17 days ago by
dtejadamartinez10 wrote:

Hi,

I download the orthologues fasta file (cds) for one gene in Ensembl, but the fasta file just have the ensembls ID and not the specie associated name.

How can I add the specie associated name to ensembls ID in a fasta file ?

(I need do that for hundred of genes)

Thanks,

ensembl • 168 views
ADD COMMENTlink modified 15 days ago by finswimmer3.6k • written 17 days ago by dtejadamartinez10
1

You can tell a lot of an Ensembl Gene ID as @Erin Lim pointed you to the Ensembl help page.

The ID will have a three letter code e.g. MUS for mouse (latin name is Mus musculus) for the BRAF orthologue in mouse: ENSMUSG00000002413.

So if you know the 3 letter code, you know the species name in your FASTA file. It's that easy. If you don't know what the 3 letter code means, check Ensembl stable ID prefixes.

ADD REPLYlink written 16 days ago by Denise - Open Targets4.5k

https://useast.ensembl.org/Help/Faq?id=488

By species associated name, I assume you meant gene symbol? You can parse the GTF yourself or use services like BioMart.

ADD REPLYlink modified 17 days ago • written 17 days ago by Eric Lim710

I think what dtejadamartinez wants is to get the names in orthologs file that one can download from Ensembl comparative genomics page. Here is one example. Click on Download orthologues button and then select fasta format.

dtejadamartinez : If you use one of the other formats you should be able to get species names.

ADD REPLYlink modified 17 days ago • written 17 days ago by genomax51k

an example CDs and expected output would help better

ADD REPLYlink written 17 days ago by cpad01127.5k

I am reasonably sure that this is a pre-formatted file pre-computed by ensembl. I have an example posted in my comment above.

ADD REPLYlink modified 17 days ago • written 17 days ago by genomax51k

Thanks,

If in Ensembl I select another format (not FASTA) it doesn't retrieve the option to download the cds

ADD REPLYlink written 17 days ago by dtejadamartinez10

In that case you will need to get the Ensembl id's out of your fasta file. Use biomaRt package in R (or use BioMart on web) to get the species names. They will then need to be added back to the fasta file.

What are you planning to do with this file BTW?

ADD REPLYlink written 17 days ago by genomax51k

Thanks, then I will use biomart in R as you suggest.

I'm going to do positive selection analysis (dN/dS)

ADD REPLYlink written 17 days ago by dtejadamartinez10
2
gravatar for finswimmer
15 days ago by
finswimmer3.6k
Germany
finswimmer3.6k wrote:

Time to summarize :)

Eric Lim showed what the ensembl ID tells us. Denise - Open Targets found the correct link to the list of ID prefixes (For some reason the link on this Help Site is wrong, I'v contacted Emily_Ensembl for this). And finally genomax give us a link to an example file one can use.

We need to create a file containing the species prefixes.

In your downloaded orthologues fasta file we have a look at the ID, which feauture type they have. In the linked example an ID looks like this:

>ENSTNIP00000017949

The last character before the digit is always a P - for protein.

Now we can modify the header by first read in our prefixes.txt, iterate over the fasta file and extract the species prefix in every header line (everything between > and P), lookup the prefix in our list and append the name to the line:

$ awk -F "\t" -v OFS="\t" 'FNR==NR {species[$1]=$2; next} {match($0, />(.+)P/, id); if (id[1] in species) {print $0, species[id[1]]} else {print}}' prefixes.txt ortho.fa > output.fa

In the output the header line now looks like this:

>ENSTNIP00000017949     Tetraodon nigroviridis (Tetraodon)

Good team play!

fin swimmer

ADD COMMENTlink modified 15 days ago • written 15 days ago by finswimmer3.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 737 users visited in the last hour