I would like to find the genomic coordinates of all the CCGG motifs across my reference genome.
The only thought to go around this would be to
grep for CCGG across my reference genome and export these sequences in a fasta format. Then align to the same genome and get the coordinates "chromosome" and "position".
However, my genome is from a teleost and there are 2 or 3 duplication events so I am not expecting to get all of them aligned uniquely. Also some times a CCGG in a fasta file might be interrupted from one line to the next one so my
grep will not be able to get the sequence.
Do you know any other way or some specific software or browser service (UCSC, NCBI, Ensembl) that can do this without aligning?