Question: How can I add the specie associated name to ensembls ID in a fasta file (cds) ?
0
gravatar for dtejadamartinez
2.0 years ago by
dtejadamartinez20 wrote:

Hi,

I download the orthologues fasta file (cds) for one gene in Ensembl, but the fasta file just have the ensembls ID and not the specie associated name.

How can I add the specie associated name to ensembls ID in a fasta file ?

(I need do that for hundred of genes)

Thanks,

ensembl • 804 views
ADD COMMENTlink modified 2.0 years ago by finswimmer13k • written 2.0 years ago by dtejadamartinez20
1

You can tell a lot of an Ensembl Gene ID as @Erin Lim pointed you to the Ensembl help page.

The ID will have a three letter code e.g. MUS for mouse (latin name is Mus musculus) for the BRAF orthologue in mouse: ENSMUSG00000002413.

So if you know the 3 letter code, you know the species name in your FASTA file. It's that easy. If you don't know what the 3 letter code means, check Ensembl stable ID prefixes.

ADD REPLYlink written 2.0 years ago by Denise - Open Targets5.1k

https://useast.ensembl.org/Help/Faq?id=488

By species associated name, I assume you meant gene symbol? You can parse the GTF yourself or use services like BioMart.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by Eric Lim1.7k

I think what dtejadamartinez wants is to get the names in orthologs file that one can download from Ensembl comparative genomics page. Here is one example. Click on Download orthologues button and then select fasta format.

dtejadamartinez : If you use one of the other formats you should be able to get species names.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax85k

an example CDs and expected output would help better

ADD REPLYlink written 2.0 years ago by cpad011213k

I am reasonably sure that this is a pre-formatted file pre-computed by ensembl. I have an example posted in my comment above.

ADD REPLYlink modified 2.0 years ago • written 2.0 years ago by genomax85k

Thanks,

If in Ensembl I select another format (not FASTA) it doesn't retrieve the option to download the cds

ADD REPLYlink written 2.0 years ago by dtejadamartinez20

In that case you will need to get the Ensembl id's out of your fasta file. Use biomaRt package in R (or use BioMart on web) to get the species names. They will then need to be added back to the fasta file.

What are you planning to do with this file BTW?

ADD REPLYlink written 2.0 years ago by genomax85k

Thanks, then I will use biomart in R as you suggest.

I'm going to do positive selection analysis (dN/dS)

ADD REPLYlink written 2.0 years ago by dtejadamartinez20
2
gravatar for finswimmer
2.0 years ago by
finswimmer13k
Germany
finswimmer13k wrote:

Time to summarize :)

Eric Lim showed what the ensembl ID tells us. Denise - Open Targets found the correct link to the list of ID prefixes (For some reason the link on this Help Site is wrong, I'v contacted Emily_Ensembl for this). And finally genomax give us a link to an example file one can use.

We need to create a file containing the species prefixes.

In your downloaded orthologues fasta file we have a look at the ID, which feauture type they have. In the linked example an ID looks like this:

>ENSTNIP00000017949

The last character before the digit is always a P - for protein.

Now we can modify the header by first read in our prefixes.txt, iterate over the fasta file and extract the species prefix in every header line (everything between > and P), lookup the prefix in our list and append the name to the line:

$ awk -F "\t" -v OFS="\t" 'FNR==NR {species[$1]=$2; next} {match($0, />(.+)P/, id); if (id[1] in species) {print $0, species[id[1]]} else {print}}' prefixes.txt ortho.fa > output.fa

In the output the header line now looks like this:

>ENSTNIP00000017949     Tetraodon nigroviridis (Tetraodon)

Good team play!

fin swimmer

ADD COMMENTlink modified 2.0 years ago • written 2.0 years ago by finswimmer13k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 617 users visited in the last hour