Question: Removing GC dinucleotides from a sequence
0
gravatar for spiral01
5 days ago by
spiral01100
spiral01100 wrote:

I have gene alignments between humans and chimpanzees and I need to remove GC dinucelotides between humans and chimpanzees. My question involves the best way to proceed with this. Is it at the codon level or at the sequence level?

For instance, if I have the sequence ACTGCA this can be split into the two codons ACT and GCA. Therefore I can remove the second codon from both the human and chimpanzee sequence and the alignment length should be fine. The problem with this method is that it doesn't account for GC dinucelotides that are across codons (e.g. TCGCAA, where splitting into codons gives us TCG and CAA).

The alternative is simply to remove every GC dinucleotide from the sequence, but this may end up reducing the sequence to a length that isn't divisible by 3 (i.e. we cannot neatly split it into codons). For example, if we remove all GC dinucelotides from the sequence TCAGCGCAT we are left with TCAAT which is an incorrect length. As I am dealing with alignments between humans and chimpanzees (and will be running PAML which requires sequences to be of length divisible by 3), this could be problematic. This is likely quite an obvious problem but I am unsure of how to proceed. Any suggestions?

EDIT: As per the comment below, the reason we wish to do this is because CpGs have much higher rates of mutation than other dinucleotides in humans.The problem here is that the density of CpGs differs between synonymous and non-synonymous sites. We are pooling sites to calculate rates of adaptive evolution for different amino acids.

snp sequence alignment gene • 111 views
ADD COMMENTlink modified 5 days ago • written 5 days ago by spiral01100

I need to remove GC dinucelotides between humans and chimpanzees

Why? Your entire question revolves around this need yet there is no explanation for this need.

ADD REPLYlink written 5 days ago by RamRS26k

Hi, please see the edit. Thanks.

ADD REPLYlink written 5 days ago by spiral01100

You may be better off soft/hard masking GCs and using an alignment tool that works well with masked sequences (most of them should).

ADD REPLYlink written 5 days ago by RamRS26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1135 users visited in the last hour