Question: Finding motifs across a genome
0
gravatar for ioannis
14 months ago by
ioannis30
ioannis30 wrote:

Hi community,

I would like to find the genomic coordinates of all the CCGG motifs across my reference genome. The only thought to go around this would be to grep for CCGG across my reference genome and export these sequences in a fasta format. Then align to the same genome and get the coordinates "chromosome" and "position". However, my genome is from a teleost and there are 2 or 3 duplication events so I am not expecting to get all of them aligned uniquely. Also some times a CCGG in a fasta file might be interrupted from one line to the next one so my grep will not be able to get the sequence.

Do you know any other way or some specific software or browser service (UCSC, NCBI, Ensembl) that can do this without aligning?

Regards, Ioannis

ccgg genome • 373 views
ADD COMMENTlink modified 14 months ago by Bastien Hervé4.4k • written 14 months ago by ioannis30

For sure grep will not be a good way to go.

Some answers here : Finding specific k-mer in human genome

ADD REPLYlink written 14 months ago by Bastien Hervé4.4k
3
gravatar for Bastien Hervé
14 months ago by
Bastien Hervé4.4k
Limoges, CBRS, France
Bastien Hervé4.4k wrote:

fuzznuc from EMBOSS Explorer

Load your reference genome, set your pattern, output the result in tab-delimited format and parse it with unix command or any language you want

ADD COMMENTlink written 14 months ago by Bastien Hervé4.4k

It runs like a dream! Thanks a lot Bastien!

Cheers,

Ioannis

ADD REPLYlink written 14 months ago by ioannis30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1785 users visited in the last hour