Entering edit mode
7.0 years ago
alirhag
▴
10
Hello All, I am doing a research project on the Viterbi and Baum-Welsh algorithms used to identify CpG Islands in a DNA Sequence. I am trying to find/download a DNA sequence where the CpG islands have already been marked so that I can compare the areas the algorithm identifies as CpG zones with the already known and identified string. I am looking for a long, >1000 nucleotide, strand, preferably in a format I can load into MATLAB as a long 1 x m vector. Someone please help. best, A.
You can find the cpg islands file for human genome here: http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/cpgIslandExt.txt.gz
GRCh38 genome sequence can be downloaded from here.