Calculate tetranucleotide frequency deviation on python
Entering edit mode
3.3 years ago
Chvatil ▴ 130

Hello everyone I'm looking for a function in python or Biopython that can calculate the tetranucleotide frequency of a given regions of scaffold.

The idea is that I have several regions and I want to identify possible changes in nucleotide composition that correspond to the an endogenization regions within my genome, for that I need to calculate theTNFs across regions for these contigs. I then need to calculate the Pearson correlation of these frequencies compared to the TNF of a set of the largest contigs in these genome assemblies (these contigs being probably really from the genome and were not endogenized).

Does someone know a such package in python?

Thanks you

biopython python TNFs • 1.3k views
Entering edit mode
3.3 years ago
Mensur Dlakic ★ 27k

CheckM can do what you need - see here:

checkm tetra seqs.fna tetra.tsv

You can use the frequencies to separate the sequences in a 2D plot using various dimensionality reductions methods such as PCA, tSNE (shown below) or UMAP.

enter image description here


Login before adding your answer.

Traffic: 1236 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6