Problem with different release of Ensembl
1
0
Entering edit mode
9.0 years ago
Lalla ▴ 40

Hi all!

I am having some problems with Ensembl. I have some exons and their coordinates annotated with Ensembl 61 (mouse). As I couldn't find the permanent link to Ensembl 61 I used the release 66 to get the sequences and I used Emboss sixpack and Interporscan5 to see the corresponding protein domains, but I couldn't find any match. I also tried to submit the aa sequence given by biomart in Ensembl 66 and I didn't have any result. When I tried the same approach with Ensembl 67 I obtained different nucleotide sequences and the corresponding protein domains found with Interproscan 5 were correct (or at least made sense with the corresponding gene studied). I also tried to translate my coordinates to the latest release, Ensembl 79, and again both the nucleotide and the aa sequences when put in Interposcan5 gave results identical to those obtained with Ensembl 67. Why there are these differences between Ensembl 66 and Ensembl 67, given that they both refer to mm9? Can I be confident that I am retrieving the right sequence, since the annotation has been made with Ensembl 61? Where can I find Ensembl 61, given that what should be its permanent link (http://feb2012.archive.ensembl.org) leads to Ensembl 66?

Thanks

ensembl • 1.9k views
ADD COMMENT
0
Entering edit mode

Hi,

Just FYI - it's EnsEMBL, not Ensemble - the last 'e' is not part of the name :)

I corrected the instances in your post.

ADD REPLY
0
Entering edit mode

Thanks! and sorry for the mistake :)

ADD REPLY
0
Entering edit mode

That's OK, we are known for our quirky anagrams!

ADD REPLY
0
Entering edit mode

For what it's worth, you can find Ensembl release 61 via ftp here (just click on anything you want under release 66 and then change 66 to 61 in the url). That along with 66 and 67 both refer to mm9, so the only difference should be the presence/absence of some patches and genes on them. An example of the sequence you got, what you got with release 67, and how you achieved all of that would be needed to give you specific further help.

ADD REPLY
0
Entering edit mode

Hi!

Thank you very much for your reply and for have posted the release 61. I have to apologize, I was using Biomart with my regions of interest and I didn't realize that when I select exons sequences it gives the sequences of all the exons of the genes which contain these regions and not only the sequences of the regions. When using Region Report the sequences are the same in Ensembl 66 and 67. I still have one problem, I can't find any match for my regions, even if they are from known genes. Some of these regions are really small, less than 20 aa, but I can't find matches even for bigger regions of more than 400 aa.

For example I have the following region on the sense strand:

chr9    109954574    109956026    +

The sequence with Region Report Ensembl 66 and 67 is

ATGTCGCTTCCAGAGAAGCAGCCAGCCGCGCTCACTGCTGCTCTCGCTGCCGAGGACGAG
CAGCTCTCTAAAGGCAACCCTCCAGAGTGCGGGATGGACTCCCGGAAAGAAATCGGTCAG
GATGGATTTGAATGGCAGAGGACAGAGGGTAAACTGAATGAAATCGGGCTGAATGTCAGC
ATGGATGGACAGCTGAAAGACAGGCTTGTGAAGAATTCCAGCTTCCTGGAACAGAATAAG
CTCGGCTTTTTTGAGGGGAAGCTAGACAAAGAACTTAGCATCGAAAAGCCTAACAAGGCG
TATCAAGAAACCTCCGGTCACCTTGAGAGCGGGTATGTGATTTCAGGAACCTGCCAGCCC
TCGGAGGGGAACTTAGTTCACCAGAAGGCAGCAGAGTTCCACCCGGGACTCACCGAGGGG
AAAGACAAAGCAGCTACAGTTCAGGGGAAGGTGGCTGGGAAGAGTGGACTAGAGATCAAG
AGCCAGCCAGATCTAAATTTTCCCGGGGCTGCTGACACTCTCACTCAACATGGTGAGGAA
CAAGAAACCAGTGCTTGGAACGCCAACTTTTACTCAGTAACTCAGAGCCCTCAGGCTGCA
ACTCCAGGAAAAGAGAAGAATGGCCTCGTCTCCAGCTGCTCTGTGACTGGGGTGATGAGT
GATAACTCTGGGCAGCTTAATAATAAGTCCCCATTACTGGTGGCTATCACCCACCCAGAC
CCAACTAGTGAGCATTTGCCCACCACTAGTCCACCAATCACTATGGTGGAATTCACTCAG
GAAAATTTGAATGCAGGCCAGGATAAAGAGTTGGAGAAATTAAGGTCTTCTGAGGAGGGG
CCTATGCTTGACCAAGTACCCCAGCAGAAAAAGGCAATCCGAAGAGCCCTGTCTGAGTGT
TACCATCTCTCAGTGCCCCCAGCTGTGAATCTTGTCGATAAGTACCCTGAGCTGCCTGCC
CGAGAGGAGCTTCCTTCTGACCTGCTGCCTCCCACCAGTAGCCCAATGCCAAGCCCTATG
CCCAGGAAGCTGGGGGTGCCTGCCATGAGGCGCTCCATGACTGTGGCTGAGGACCAGTCA
GCTAGCTGCAGATTGAGTGCTGGGGAACTGGCCAGCCTGTCTGCTTCTCAGGTACCAACT
GCTTTAACCTTCGAGGAACCAGTGGCCAAGGAGAGAGAAGAACAGATCCACTTCAGCAAT
GATAGCAACAGCTCTGGGAAGAAGGAACTGGGCATCGCTGGGCTGTATCTCCACAGTAAG
CTGGAACAGATTCCCGAAGGAAGCCACAAGGGAAAGGGGCAGGAAAATACTGGTGAGACA
AGAGTTGATTCATGTCCCTTCATCTGCCTGGGAGGCGAGAAACAGCTGATGGCACTGGCA
GGGAAGAAAGAAATTGAGGTCACTGCAACCCAGAGCATTCCATCATTGCTGTTGGAAGAG
ACCCCACGTGATG

I tried different ORF of the Emboss 6-pack output, both reverse and not reverse, but I couldn't find any match in Interproscan 5. Am I doing something wrong?

I guess that one approach would be to find the exons in which these regions are contained (for example looking at the exons sequences in Biomart and then identify my specific region sequence found with Region report). I can do it manually as I don't have many regions now, but is there a faster approach?

Thanks again and sorry for the previous post, I am new in this field.

ADD REPLY
0
Entering edit mode

What sort of translations are you getting? The proper one that I get with a simple online tools is:

MSLPEKQPAALTAALAAEDEQLSKGNPPECGMDSRKEIGQDGFEWQRTEGKLNEIGLNVS
MDGQLKDRLVKNSSFLEQNKLGFFEGKLDKELSIEKPNKAYQETSGHLESGYVISGTCQP
SEGNLVHQKAAEFHPGLTEGKDKAATVQGKVAGKSGLEIKSQPDLNFPGAADTLTQHGEE
QETSAWNANFYSVTQSPQAATPGKEKNGLVSSCSVTGVMSDNSGQLNNKSPLLVAITHPD
PTSEHLPTTSPPITMVEFTQENLNAGQDKELEKLRSSEEGPMLDQVPQQKKAIRRALSEC
YHLSVPPAVNLVDKYPELPAREELPSDLLPPTSSPMPSPMPRKLGVPAMRRSMTVAEDQS
ASCRLSAGELASLSASQVPTALTFEEPVAKEREEQIHFSNDSNSSGKKELGIAGLYLHSK
LEQIPEGSHKGKGQENTGETRVDSCPFICLGGEKQLMALAGKKEIEVTATQSIPSLLLEE
TPRD

Note that this will only work with unspliced genes.

ADD REPLY
0
Entering edit mode

This is the output with Emboss 6-pack

>EMBOSS_001_1_ORF1  Translation of EMBOSS_001 in frame 1, ORF 1, threshold 1, 485aa
MSLPEKQPAALTAALAAEDEQLSKGNPPECGMDSRKEIGQDGFEWQRTEGKLNEIGLNVS
MDGQLKDRLVKNSSFLEQNKLGFFEGKLDKELSIEKPNKAYQETSGHLESGYVISGTCQP
SEGNLVHQKAAEFHPGLTEGKDKAATVQGKVAGKSGLEIKSQPDLNFPGAADTLTQHGEE
QETSAWNANFYSVTQSPQAATPGKEKNGLVSSCSVTGVMSDNSGQLNNKSPLLVAITHPD
PTSEHLPTTSPPITMVEFTQENLNAGQDKELEKLRSSEEGPMLDQVPQQKKAIRRALSEC
YHLSVPPAVNLVDKYPELPAREELPSDLLPPTSSPMPSPMPRKLGVPAMRRSMTVAEDQS
ASCRLSAGELASLSASQVPTALTFEEPVAKEREEQIHFSNDSNSSGKKELGIAGLYLHSK
LEQIPEGSHKGKGQENTGETRVDSCPFICLGGEKQLMALAGKKEIEVTATQSIPSLLLEE
TPRDX
>EMBOSS_001_2_ORF1  Translation of EMBOSS_001 in frame 2, ORF 1, threshold 1, 51aa
CRFQRSSQPRSLLLSLPRTSSSLKATLQSAGWTPGKKSVRMDLNGRGQRVN
>EMBOSS_001_2_ORF2  Translation of EMBOSS_001 in frame 2, ORF 2, threshold 1, 4aa
MKSG
>EMBOSS_001_2_ORF3  Translation of EMBOSS_001 in frame 2, ORF 3, threshold 1, 7aa
MSAWMDS
>EMBOSS_001_2_ORF4  Translation of EMBOSS_001 in frame 2, ORF 4, threshold 1, 4aa
KTGL
>EMBOSS_001_2_ORF5  Translation of EMBOSS_001 in frame 2, ORF 5, threshold 1, 17aa
RIPASWNRISSAFLRGS
>EMBOSS_001_2_ORF6  Translation of EMBOSS_001 in frame 2, ORF 6, threshold 1, 24aa
TKNLASKSLTRRIKKPPVTLRAGM
>EMBOSS_001_2_ORF7  Translation of EMBOSS_001 in frame 2, ORF 7, threshold 1, 11aa
FQEPASPRRGT
>EMBOSS_001_2_ORF8  Translation of EMBOSS_001 in frame 2, ORF 8, threshold 1, 31aa
FTRRQQSSTRDSPRGKTKQLQFRGRWLGRVD
>EMBOSS_001_2_ORF9  Translation of EMBOSS_001 in frame 2, ORF 9, threshold 1, 7aa
RSRASQI
>EMBOSS_001_2_ORF10  Translation of EMBOSS_001 in frame 2, ORF 10, threshold 1, 27aa
IFPGLLTLSLNMVRNKKPVLGTPTFTQ
>EMBOSS_001_2_ORF11  Translation of EMBOSS_001 in frame 2, ORF 11, threshold 1, 21aa
LRALRLQLQEKRRMASSPAAL
>EMBOSS_001_2_ORF12  Translation of EMBOSS_001 in frame 2, ORF 12, threshold 1, 2aa
LG
>EMBOSS_001_2_ORF13  Translation of EMBOSS_001 in frame 2, ORF 13, threshold 1, 43aa
VITLGSLIISPHYWWLSPTQTQLVSICPPLVHQSLWWNSLRKI
>EMBOSS_001_2_ORF14  Translation of EMBOSS_001 in frame 2, ORF 14, threshold 1, 10aa
MQARIKSWRN
>EMBOSS_001_2_ORF15  Translation of EMBOSS_001 in frame 2, ORF 15, threshold 1, 34aa
GLLRRGLCLTKYPSRKRQSEEPCLSVTISQCPQL
>EMBOSS_001_2_ORF16  Translation of EMBOSS_001 in frame 2, ORF 16, threshold 1, 39aa
ILSISTLSCLPERSFLLTCCLPPVAQCQALCPGSWGCLP
>EMBOSS_001_2_ORF17  Translation of EMBOSS_001 in frame 2, ORF 17, threshold 1, 3aa
GAP
>EMBOSS_001_2_ORF18  Translation of EMBOSS_001 in frame 2, ORF 18, threshold 1, 11aa
LWLRTSQLAAD
>EMBOSS_001_2_ORF19  Translation of EMBOSS_001 in frame 2, ORF 19, threshold 1, 16aa
VLGNWPACLLLRYQLL
>EMBOSS_001_2_ORF20  Translation of EMBOSS_001 in frame 2, ORF 20, threshold 1, 73aa
PSRNQWPRREKNRSTSAMIATALGRRNWASLGCISTVSWNRFPKEATRERGRKILVRQEL
IHVPSSAWEARNS
>EMBOSS_001_2_ORF21  Translation of EMBOSS_001 in frame 2, ORF 21, threshold 1, 28aa
WHWQGRKKLRSLQPRAFHHCCWKRPHVM
>EMBOSS_001_3_ORF1  Translation of EMBOSS_001 in frame 3, ORF 1, threshold 1, 22aa
VASREAASRAHCCSRCRGRAAL
>EMBOSS_001_3_ORF2  Translation of EMBOSS_001 in frame 3, ORF 2, threshold 1, 19aa
RQPSRVRDGLPERNRSGWI
>EMBOSS_001_3_ORF3  Translation of EMBOSS_001 in frame 3, ORF 3, threshold 1, 6aa
MAEDRG
>EMBOSS_001_3_ORF4  Translation of EMBOSS_001 in frame 3, ORF 4, threshold 1, 2aa
TE
>EMBOSS_001_3_ORF5  Translation of EMBOSS_001 in frame 3, ORF 5, threshold 1, 25aa
NRAECQHGWTAERQACEEFQLPGTE
>EMBOSS_001_3_ORF6  Translation of EMBOSS_001 in frame 3, ORF 6, threshold 1, 4aa
ARLF
>EMBOSS_001_3_ORF7  Translation of EMBOSS_001 in frame 3, ORF 7, threshold 1, 7aa
GEARQRT
>EMBOSS_001_3_ORF8  Translation of EMBOSS_001 in frame 3, ORF 8, threshold 1, 4aa
HRKA
>EMBOSS_001_3_ORF9  Translation of EMBOSS_001 in frame 3, ORF 9, threshold 1, 10aa
QGVSRNLRSP
>EMBOSS_001_3_ORF10  Translation of EMBOSS_001 in frame 3, ORF 10, threshold 1, 62aa
ERVCDFRNLPALGGELSSPEGSRVPPGTHRGERQSSYSSGEGGWEEWTRDQEPARSKFSR
GC
>EMBOSS_001_3_ORF11  Translation of EMBOSS_001 in frame 3, ORF 11, threshold 1, 6aa
HSHSTW
>EMBOSS_001_3_ORF12  Translation of EMBOSS_001 in frame 3, ORF 12, threshold 1, 41aa
GTRNQCLERQLLLSNSEPSGCNSRKREEWPRLQLLCDWGDE
>EMBOSS_001_3_ORF13  Translation of EMBOSS_001 in frame 3, ORF 13, threshold 1, 4aa
LWAA
>EMBOSS_001_3_ORF14  Translation of EMBOSS_001 in frame 3, ORF 14, threshold 1, 13aa
VPITGGYHPPRPN
>EMBOSS_001_3_ORF15  Translation of EMBOSS_001 in frame 3, ORF 15, threshold 1, 5aa
AFAHH
>EMBOSS_001_3_ORF16  Translation of EMBOSS_001 in frame 3, ORF 16, threshold 1, 18aa
STNHYGGIHSGKFECRPG
>EMBOSS_001_3_ORF17  Translation of EMBOSS_001 in frame 3, ORF 17, threshold 1, 8aa
RVGEIKVF
>EMBOSS_001_3_ORF18  Translation of EMBOSS_001 in frame 3, ORF 18, threshold 1, 5aa
GGAYA
>EMBOSS_001_3_ORF19  Translation of EMBOSS_001 in frame 3, ORF 19, threshold 1, 14aa
PSTPAEKGNPKSPV
>EMBOSS_001_3_ORF20  Translation of EMBOSS_001 in frame 3, ORF 20, threshold 1, 14aa
VLPSLSAPSCESCR
>EMBOSS_001_3_ORF21  Translation of EMBOSS_001 in frame 3, ORF 21, threshold 1, 2aa
VP
>EMBOSS_001_3_ORF22  Translation of EMBOSS_001 in frame 3, ORF 22, threshold 1, 9aa
AACPRGASF
>EMBOSS_001_3_ORF23  Translation of EMBOSS_001 in frame 3, ORF 23, threshold 1, 6aa
PAASHQ
>EMBOSS_001_3_ORF24  Translation of EMBOSS_001 in frame 3, ORF 24, threshold 1, 22aa
PNAKPYAQEAGGACHEALHDCG
>EMBOSS_001_3_ORF25  Translation of EMBOSS_001 in frame 3, ORF 25, threshold 1, 4aa
GPVS
>EMBOSS_001_3_ORF26  Translation of EMBOSS_001 in frame 3, ORF 26, threshold 1, 38aa
LQIECWGTGQPVCFSGTNCFNLRGTSGQGERRTDPLQQ
>EMBOSS_001_3_ORF27  Translation of EMBOSS_001 in frame 3, ORF 27, threshold 1, 17aa
QQLWEEGTGHRWAVSPQ
>EMBOSS_001_3_ORF28  Translation of EMBOSS_001 in frame 3, ORF 28, threshold 1, 18aa
AGTDSRRKPQGKGAGKYW
>EMBOSS_001_3_ORF29  Translation of EMBOSS_001 in frame 3, ORF 29, threshold 1, 3aa
DKS
>EMBOSS_001_3_ORF30  Translation of EMBOSS_001 in frame 3, ORF 30, threshold 1, 22aa
FMSLHLPGRRETADGTGREERN
>EMBOSS_001_3_ORF31  Translation of EMBOSS_001 in frame 3, ORF 31, threshold 1, 17aa
GHCNPEHSIIAVGRDPT
>EMBOSS_001_3_ORF32  Translation of EMBOSS_001 in frame 3, ORF 32, threshold 1, 1aa
X

I tried also your aa sequence with Interproscan 5 and I didn't find any match. This sequence is an exon or part of an exon which is differentially spliced and is a coding sequence. Why can't I find any protein domain encoded by this sequence?

ADD REPLY
0
Entering edit mode

The first sequence (i.e., what I posted as well) is the correct one. Interproscan scans for known signatures, so it won't always find anything. Why not read some reviews on this family of proteins, perhaps this is an intrinsically disordered region.

ADD REPLY
0
Entering edit mode
9.0 years ago
Lalla ▴ 40

Thanks. It just seemed too odd that all the regions that I looked at were disordered, but that might be the case. Thanks for all the help!

ADD COMMENT

Login before adding your answer.

Traffic: 2028 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6