Where Are The New Lamprey Orfs ?
2
0
Entering edit mode
11.2 years ago
cdsouthan ★ 1.9k

Its good to see the new Lamprey paper http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.2568.html

But, the ORF availability and searchable status is unclear

1) What is the relationship between the Ensembl Pmarinus_7.0, Jan 2011 build (10,402 genes) and the assembly studied in the paper (the paper does not mention Ensembl) and is their latest one destined for the Ensembl pipe ?

2) Where is their final set of 26,046 ORFs instanciated (MAKER output ?) and how can I search them ?

3) The Ensembl ORF I tried had no TBLASTN matches against WGA

4) Because the RNA-seq transcript data went into SRA I can't do TBLASTN for ORFs - woudn't it be better to get it into TSA ?

This is a similar set of problems to those I encountered with the Oyster geome (Searching for proteins in the new Oyster genome)

Its great to see these these new genomes but I wish authors and journal editors would ensure the data is released into major portals and sequence divisions that we can actually query.

genome annotation protein ensembl ucsc • 2.3k views
ADD COMMENT
3
Entering edit mode
11.2 years ago
Emily 23k

At Ensembl we did our own genebuild on lamprey, which resulted in 11,429 translations (compared to 26,720 from Maker). There's documentation on how we carry out our genebuild on our website: http://www.ensembl.org/info/docs/genebuild/index.html

The Maker predictions are available via our Ensembl Core API. They're listed as "otherfeatures".

ADD COMMENT
0
Entering edit mode

This begs the questions as two how the pipelines can differ 2.4-fold and why your stats are (slightly) different. And it still seems odd that the ORF set upon which some key comparisons in the paper were made is tucked away

ADD REPLY
1
Entering edit mode

My stats are the number of translations. The stats above are the number of genes. Lamprey's not the most popular species to study so it's not really surprising that so few splice variants are known.

Our pipeline is completely open source (see link to documentation above). I'm afraid I don't know how the Maker pipeline works, but it would certainly be interesting to examine why they differ so much.

And in my experience, all Nature papers have the most useful information hidden away. The paper itself is all pretty pictures and conclusions, and you have to check the supplemental to get the meat.

ADD REPLY
0
Entering edit mode

Thanks. Your last sentance is exactly what I am getting at in this posting (and that protein set is not in the SD). So we have concordance in opinion about discordance in the ORF pipes. (assuming you are ED I think we've met when I was on ELIXIR)

ADD REPLY
1
Entering edit mode
11.2 years ago

I would say that UCSC and Ensembl are major portals:

http://hgdownload.soe.ucsc.edu/goldenPath/petMar2/database/

This should contain the Maker set (from the guy who annotated the genome with Maker: MS Campbell). The same genome assembly was used by the Ensemble and Maker groups.

Ensembl link:

http://uswest.ensembl.org/Petromyzon_marinus/Info/Index

ADD COMMENT
0
Entering edit mode

Thanks, its now clear the data in the paper was the Sep. 2010 (WUGSC 7.0/petMar2) assembly, but they don't say this. I'd still like just to search the 26,046 ORFs rather than download but I can't see them as UCSC tracks, only the Genscans

ADD REPLY
1
Entering edit mode

In my limited experience with annotation datasets i have found there aren't easy ways to do theoretically simple tasks. Often I use perl APIs to accomplish tasks like querying databases and subsetting data.

ADD REPLY

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6