No Sequencing Data at Low Positions
0
0
Entering edit mode
4.7 years ago
ccnn ▴ 20

Opening a bam file for just chr22, I was surprised to see that there were no reads aligned until around position 16,050,000. In UCSC Genome Browser, looking at a window of chr22:16,049,420-16,050,420, I can see that there's "nothing," but then different tracks start. In chr1, I also think I remember seeing that alignments started only at 10,000.

Why is there nothing earlier in the chromosome? Do those positions not correspond to DNA? I've downloaded data on the length/end position of each chromosome; can I find a list of these "start" positions?

dna sequencing ngs next-gen • 896 views
0
Entering edit mode

Are you looking at the right genome build in UCSC? Also are you sure the data is aligned against UCSC genome build (which have a chr prefix for chromosomes as opposed to other builds which may only have numbers.

Beginnings of the chromosome sequence may only have N's since the ends of chromosomes are hard to sequence.

0
Entering edit mode

22 is only 51304566 bp long. So the first ~16M bases is almost a third of it

0
Entering edit mode

Ah it does indeed like everything before that is "N" when I zoom into "base" on the browser.

0
Entering edit mode

So those N-nucleotides means that "we know there are nucleotides there, we are just not sure what they are"

0
Entering edit mode

Got it. Thank you! So is there somewhere I can find out how many of the first bases of each chromosome are N?

0
Entering edit mode

Hello,

you could use your language of choice to find the first position in each reference sequence which is not an N.

fin swimmer