Question

Comparing genomes against randomized genomes

0

Entering edit mode

3.0 years ago

schlogl ▴ 160

Hi there!

I would like to have some advice in comparing a composition between a real genome against a randomized genome.

The question is about the randomization (as a background or Null model). When I randomize(schuffling the genome) a genome for comparing kmer composition It is important to keep the base frequency or a dinucleotide sequence?

I did read some papers but they used a expected value instead!

But I would like to look for a count method just similar to the kmer count method.

Any tip or paper using a similar approach would be appreciated!

Paulo

sequence genomes • 1.3k views

ADD COMMENT • link 3.0 years ago by schlogl ▴ 160

1

Entering edit mode

For transcription factor sequences or CpG islands, say, you cannot treat nucleotide frequencies as independent or shuffle bases. You might investigate hidden Markov model (HMM) approaches for generating simulated sequence, based upon categories of background regions.

ADD REPLY • link 3.0 years ago by Alex Reynolds 35k

0

Entering edit mode

Got it. Thank you @Alex Reynolds

ADD REPLY • link 3.0 years ago by schlogl ▴ 160

0

Entering edit mode

Why not compare actual genomes? DNA sequence in genomes is not random. What kind of significance do you expect to gain from unnatural comparisons?

ADD REPLY • link 3.0 years ago by 5heikki 11k

0

Entering edit mode

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0058038 To make a study like similar to this...

ADD REPLY • link 3.0 years ago by schlogl ▴ 160

score 1 · Answer 1 · 2021-05-04

1

Entering edit mode

3.0 years ago

Mensur Dlakic ★ 27k

This may be of interest:

https://github.com/guma44/ushuffle

I used it to shuffle metagenome assemblies at a tetranucleotide level, and it does a great job of preserving the overall composition while completely changing the sequence.