Question: 200Bp Long Query Returning 201Bp
3
secretjess • 180 wrote:
In Ensembl when I use this link to return the sequence on the sense strand of X chromosome between 200000 and 200200 I get the following output:
>X dna:chromosome chromosome:GRCh37:X:200000:200200:1
CCAAACCCCAGGCAGGAGACCAGCCCGTGTTATACGGTGCCTGGAGGAGGCGTGACTCAT
TTGCATAGCGCTGAGGGGATTGGTCTGACCAGGCCTGTCATTCACGTAGCCCGCGAAAAA
CCTGGCCCGCCCACCCCAGTTCCGTAATATGCAAATGTAGGGCGCCATGATGTTCCACAC
GCCTGAGGGTAGTGGGGGCGG
This contains 201 nucleotides, but from my query I was expecting 200. Where has this extra nucleotide come from? Which position is it at? Is my query wrong?
ADD COMMENT
• link
•
modified 8.0 years ago
by
Istvan Albert ♦♦ 86k
•
written
8.0 years ago by
secretjess • 180
This is absolutely fine. Look, if you specify your range as 200000:200001 you will have two nucleotides: (1) C at position 200000 and (2) C at position 200001. So, length = end - start +1.
To complete the answer: this is because Ensembl uses closed intervals both for end and beginning coordinates (ie. your end coordinate will be considered as the last one of the interval).
That's a really good way of explaining it - makes complete sense now, thank you!