Question: No Sequencing Data at Low Positions
0
gravatar for ccnn
15 months ago by
ccnn10
ccnn10 wrote:

Opening a bam file for just chr22, I was surprised to see that there were no reads aligned until around position 16,050,000. In UCSC Genome Browser, looking at a window of chr22:16,049,420-16,050,420, I can see that there's "nothing," but then different tracks start. In chr1, I also think I remember seeing that alignments started only at 10,000.

Why is there nothing earlier in the chromosome? Do those positions not correspond to DNA? I've downloaded data on the length/end position of each chromosome; can I find a list of these "start" positions?

sequencing next-gen ngs dna • 389 views
ADD COMMENTlink written 15 months ago by ccnn10

Are you looking at the right genome build in UCSC? Also are you sure the data is aligned against UCSC genome build (which have a chr prefix for chromosomes as opposed to other builds which may only have numbers.

Beginnings of the chromosome sequence may only have N's since the ends of chromosomes are hard to sequence.

ADD REPLYlink written 15 months ago by genomax69k

I think so. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr22%3A16049522-16050522&hgsid=662132401_e7mUocboTcIMiTPiqCEGpqCVAIpq

22 is only 51304566 bp long. So the first ~16M bases is almost a third of it

ADD REPLYlink modified 15 months ago • written 15 months ago by ccnn10

Ah it does indeed like everything before that is "N" when I zoom into "base" on the browser.

ADD REPLYlink written 15 months ago by ccnn10

So those N-nucleotides means that "we know there are nucleotides there, we are just not sure what they are"

ADD REPLYlink written 15 months ago by WouterDeCoster40k

Got it. Thank you! So is there somewhere I can find out how many of the first bases of each chromosome are N?

ADD REPLYlink written 15 months ago by ccnn10

Hello,

you could use your language of choice to find the first position in each reference sequence which is not an N.

fin swimmer

ADD REPLYlink written 15 months ago by finswimmer11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 547 users visited in the last hour