200Bp Long Query Returning 201Bp
1
3
Entering edit mode
11.1 years ago
secretjess ▴ 210

In Ensembl when I use this link to return the sequence on the sense strand of X chromosome between 200000 and 200200 I get the following output:

>X dna:chromosome chromosome:GRCh37:X:200000:200200:1
CCAAACCCCAGGCAGGAGACCAGCCCGTGTTATACGGTGCCTGGAGGAGGCGTGACTCAT
TTGCATAGCGCTGAGGGGATTGGTCTGACCAGGCCTGTCATTCACGTAGCCCGCGAAAAA
CCTGGCCCGCCCACCCCAGTTCCGTAATATGCAAATGTAGGGCGCCATGATGTTCCACAC
GCCTGAGGGTAGTGGGGGCGG

This contains 201 nucleotides, but from my query I was expecting 200. Where has this extra nucleotide come from? Which position is it at? Is my query wrong?

ensembl sequence • 3.2k views
ADD COMMENT
9
Entering edit mode

This is absolutely fine. Look, if you specify your range as 200000:200001 you will have two nucleotides: (1) C at position 200000 and (2) C at position 200001. So, length = end - start +1.

ADD REPLY
2
Entering edit mode

To complete the answer: this is because Ensembl uses closed intervals both for end and beginning coordinates (ie. your end coordinate will be considered as the last one of the interval).

ADD REPLY
0
Entering edit mode

That's a really good way of explaining it - makes complete sense now, thank you!

ADD REPLY
4
Entering edit mode
11.1 years ago

This has been answered above C: 200bp long query returning 201bp by a.zielezinski just adding it here to mark the question as answered.

The choice of interval representation (zero or one based) has advantages and disadvantages that have long intrigued people. It is a non-trivial matter and has many implications as demonstrated by numerous posts here and elsewhere

and many others

ADD COMMENT

Login before adding your answer.

Traffic: 1619 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6