Entering edit mode
10.0 years ago
soosus
▴
10
I have a dataframe (trip) that contains a column (SNP). It looks like this (but longer, and it has 192 levels):
SNP
C[T->C]T
C[G->C]A
G[A->C]A
C[T->C]C
C[C->A]G
T[G->A]C
...
I want to pattern match and replace on the following criteria:
gsub("G->T", "C->A", trip)
gsub("G->C", "C->G", trip)
gsub("G->A", "C->T", trip)
gsub("A->T", "T->A", trip)
gsub("A->G", "T->C", trip)
gsub("A->C", "T->G", trip)
but ALSO, if one of the patterns listed above is found, I want the string in which it's contained have additional substitutions applied. Namely:
if ((grep(G->T|G->C|G->C|A->T|A->G|A->C), trip$SNP)==TRUE){
substr(trip$SNP, 1,1) <- tr /ATCG/TAGC/; #incompatible perl syntax?
substr(trip$SNP, 8,8) <- tr /ATCG/TAGC/;
}
As in, if any of these patterns--G->T, G->C, G->C, A->T, A->G, or A->C--is found in a string in trip$SNP, replace the 1st and 8th characters in that string according to this regex: tr /ATCG/TAGC/;
Desired output, with changes highlighted:
SNP
C[T->C]T
C[G->C]A
G[A->C]A
C[T->C]C
C[C->A]G
T[G->A]C
to:
SNP
C[T->C]T
G[C->G]T #<-- changed
C[T->G]T #<-- changed
C[T->C]C
C[C->A]G
A[C->T]G #<-- changed
Is there a more elegant way to do this?
So, you want to complement (but not reverse) the sequence in SNP if and only if the substitution is G-> something or A-> something, right? You can have a much easier solution, in single line of code then. Is the format for SNP fixed (substitution base +1 base before and after)?
Yes, it's fixed. It's simply the SNP, expressed as the reference->variant, flanked by its neighbors. And yes, complement but not reverse.
Then check out code below...