Question: Reference Protein-Coding Sequence
1
gravatar for Not Durrett
8.5 years ago by
Not Durrett10
Not Durrett10 wrote:

Hello,

Where can I find a (or the canonical) collection of "reference protein-coding sequences" for mouse (and/or S Pombe)?

For context, I am trying to make the Oracle Set referred to in the recent Nature paper on Trinity (http://www.nature.com/nbt/journal/vaop/ncurrent/full/nbt.1883.html). From the paper:

We next estimated the upper sensitivity limit for which annotated transcripts can possibly be perfectly reconstructed given a particular data set of sequences. Any assembly approach based on a particular k-length oligomer is limited to those sequences that are represented by the exact k-mer composition of the RNA-Seq read set. To determine this empirical upper sensitivity limit, we built a k-mer dictionary from all the reads and identified all known reference protein-coding sequences that are reconstructable to full length given the read set, as those sequences that can be populated by adjacent and overlapping k-mers across their entire length. We call this set of sequences the 'Oracle Set'. Because this set also contains transcript sequences that are covered by k-mers, but not entire reads, some transcripts will appear reconstructable but are not. Conversely, the Oracle Set reflects only annotated known genes and known isoforms, which are likely an underestimate, especially in mammals16. Nevertheless, the Oracle Set provides a useful sensitivity benchmark.

Thanks in advance for any help! (And I apologize if this is a stupid question - I am just starting out in bioinformatics research.)

reference sequence protein rna • 1.6k views
ADD COMMENTlink written 8.5 years ago by Not Durrett10

Thanks (very late, I know) to both of you for the answers - just what I was looking for.

ADD REPLYlink written 8.5 years ago by Not Durrett0
3
gravatar for Michael Schubert
8.5 years ago by
Cambridge, UK
Michael Schubert6.9k wrote:

2 Links for you:

ADD COMMENTlink written 8.5 years ago by Michael Schubert6.9k

Hey, why a negative vote on this one? This is a perfectly valid answer!

ADD REPLYlink written 8.5 years ago by Lyco2.3k

indeed a bit peculiar ;)

ADD REPLYlink written 8.5 years ago by Michael Schubert6.9k
2
gravatar for Lyco
8.5 years ago by
Lyco2.3k
Germany
Lyco2.3k wrote:

I haven't read this particular paper, but most people think of the RefSeq database when using the word 'reference sequence' - especially when they talk about mammalian sequences. So this would be the best bet for the mouse reference sequences. An alternative explanation would be the 'reference sequence' as published by the associated genome project - this is often the case when people talk about bacterial sequences or simple eukaryotes. In the case of pombe, this would probably be the version at the Sanger centre.

ADD COMMENTlink written 8.5 years ago by Lyco2.3k
0
gravatar for Not Durrett
8.5 years ago by
Not Durrett0 wrote:

Thanks (very late, I know) to both of you for the answers - just what I was looking for.

[I know that it is inappropriate to post this as an answer. I asked my question on a public lab computer without logging in, and I am not able to comment as a new user.]

[argh, can't delete it now.]

-OP

ADD COMMENTlink written 8.5 years ago by Not Durrett0

If this answers your questions, you could 'close' this subject by accepting one of the answers.

ADD REPLYlink written 8.5 years ago by Lyco2.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1134 users visited in the last hour