Question: Basic understanding of genome sequences
2
gravatar for maria.kesa
4.1 years ago by
maria.kesa30
Estonia
maria.kesa30 wrote:

Hello,

 

My name is Maria. I'm a master's student from Estonia. I want to ask some basic (okay, maybe a little bit silly questions). I'm starting to work with 1000Genomes data and I've never worked with genome sequences before. 

I want to download sub-sequences of a genome. The instruction says to indicate it like 1:1-50000. I understand that 1 in front of : refers to chromosome number, is that correct? And 1-50000 would be the first 50000 nucleotides? 

Are the genomes of different people of different lengths due to copy number variations? Does the sequencing according to a reference genome take account of these differences or would all the genomes in 1000Genomes be of the same length as they are aligned to a reference genome?

What if I wanted to obtain specific genes from the sequences? Is there any tool to do that?

Thank you!

alignment genome • 1.1k views
ADD COMMENTlink modified 4.1 years ago by Ashutosh Pandey11k • written 4.1 years ago by maria.kesa30
8
gravatar for Ashutosh Pandey
4.1 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

I want to download sub-sequences of a genome. The instruction says to indicate it like 1:1-50000. I understand that 1 in front of : refers to chromosome number, is that correct? And 1-50000 would be the first 50000 nucleotides? 

-Yes, you are correct

Are the genomes of different people of different lengths due to copy number variations? 

-Yes. Indels will also contribute to differences in genome lengths.

Does the sequencing according to a reference genome take account of these differences or would all the genomes in 1000Genomes be of the same length as they are aligned to a reference genome?

-Reads from different genomic samples are aligned to the same reference genome so that multiple genomes can be easily compared to each other for the presence/absence of a genomic variant. Otherwise it would be tough to carry out any comparisons. In short, you can say that all the genomes in 1000 Genomes are of same length w.r.t to the coordinate location of a variant/gene. Though bam files have enough information to predict copy number variants and identify insertions and deletions differing between individuals. 

What if I wanted to obtain specific genes from the sequences? Is there any tool to do that?

-You can download coordinates of your gene of interest from Ensembl (gtf file) or UCSC genome browser (gtf/bed) and then use those coordinates (for e.g. chr2:100000-1020000) to fetch the reads overlapping that region from the bam file. You can use samtools view function. 

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by Ashutosh Pandey11k

Thank you! I am very grateful:-)

ADD REPLYlink written 4.1 years ago by maria.kesa30

I am glad that I could help :-)

ADD REPLYlink written 4.1 years ago by Ashutosh Pandey11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1130 users visited in the last hour