Comparing genomes against randomized genomes
1
0
Entering edit mode
3.0 years ago
schlogl ▴ 160

Hi there!

I would like to have some advice in comparing a composition between a real genome against a randomized genome.

The question is about the randomization (as a background or Null model). When I randomize(schuffling the genome) a genome for comparing kmer composition It is important to keep the base frequency or a dinucleotide sequence?

I did read some papers but they used a expected value instead!

But I would like to look for a count method just similar to the kmer count method.

Any tip or paper using a similar approach would be appreciated!

Paulo

sequence genomes • 1.3k views
ADD COMMENT
1
Entering edit mode

For transcription factor sequences or CpG islands, say, you cannot treat nucleotide frequencies as independent or shuffle bases. You might investigate hidden Markov model (HMM) approaches for generating simulated sequence, based upon categories of background regions.

ADD REPLY
0
Entering edit mode

Got it. Thank you @Alex Reynolds

ADD REPLY
0
Entering edit mode

Why not compare actual genomes? DNA sequence in genomes is not random. What kind of significance do you expect to gain from unnatural comparisons?

ADD REPLY
0
Entering edit mode
ADD REPLY
1
Entering edit mode
3.0 years ago
Mensur Dlakic ★ 27k

This may be of interest:

https://github.com/guma44/ushuffle

I used it to shuffle metagenome assemblies at a tetranucleotide level, and it does a great job of preserving the overall composition while completely changing the sequence.

ADD COMMENT
0
Entering edit mode

Thank you @Mensur Dlakic 8)

ADD REPLY

Login before adding your answer.

Traffic: 2275 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6