My name is Maria. I'm a master's student from Estonia. I want to ask some basic (okay, maybe a little bit silly questions). I'm starting to work with 1000Genomes data and I've never worked with genome sequences before.
I want to download sub-sequences of a genome. The instruction says to indicate it like 1:1-50000. I understand that 1 in front of : refers to chromosome number, is that correct? And 1-50000 would be the first 50000 nucleotides?
Are the genomes of different people of different lengths due to copy number variations? Does the sequencing according to a reference genome take account of these differences or would all the genomes in 1000Genomes be of the same length as they are aligned to a reference genome?
What if I wanted to obtain specific genes from the sequences? Is there any tool to do that?