Entering edit mode
7.4 years ago
jinkuozhang
▴
30
When I was fetching Human Genomics DNA sequence using Coordinates, I found that the nucleotide sequences at each end of human chromosomes(except chr17) are "NNNNN...". Could anyone tell me why they are "NNNN..." ranther than ATCG etc. Thanks!
> library(BSgenome.Hsapiens.UCSC.hg19)
> chrs = seqnames(Hsapiens)[1:24]
> chrs
[1] "chr1" "chr2" "chr3" "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" "chr10" "chr11"
[12] "chr12" "chr13" "chr14" "chr15" "chr16" "chr17" "chr18" "chr19" "chr20" "chr21" "chr22"
[23] "chrX" "chrY"
> first_10000_nt = getSeq(Hsapiens, names=chrs, start=rep(1, 24), end=rep(10000, 24))
> names(first_10000_nt) = chrs
> first_10000_nt
A DNAStringSet instance of length 24
width seq names
[1] 10000 NNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNN chr1
... ... ...
[24] 10000 NNNNNNNNNNNNNNNNNNNNNNNNNNNNN...NNNNNNNNNNNNNNNNNNNNNNNNNNNN chrY
> first_10000_nt["chr17"]
A DNAStringSet instance of length 1enter code here
width seq names
[1] 10000 AAGCTTCTCACCCTGTTCCTGCATAGATA...CGCCATGTTGGCCAGGCTCTCTCGAACTC chr17
More specifically: unresolved telomeric sequences.
Thank you! Medhat. Feel sorry for not searching the resolved question.