Regular expression to extract mutations type
3
0
Entering edit mode
2.8 years ago
Gene_MMP8 ▴ 210

I have a list of strings of the format "c.2455C>T".

How can I extract the mutation type or the "C>T" part from the string?

R regex • 689 views
ADD COMMENT
2
Entering edit mode

Like with most of your questions, you should start by showing some effort, like what you've tried and how your data look like. Are there edge cases like indels or in general cases where you have non 1vs1 nucleotide changes? Show an example (more than n=1) of the data. Brief Reminder On How To Ask A Good Question

ADD REPLY
0
Entering edit mode

Sorry about that. I have read the link you shared and will definitely consider them all while asking questions in the future.

ADD REPLY
4
Entering edit mode
2.8 years ago
grep "[ATGC]>[ATGC]" file.txt > output.txt
ADD COMMENT
2
Entering edit mode
2.8 years ago
Prakash ★ 2.1k

using simple grep command you can extract the line based on pattern. for e.g

grep "C>T" file.txt >output.txt

and grep can be done in R as well

a <- c("c.2455C>T","c.2455C>G","c.2455C>A")
b <- grep(pattern = "[C]>[TG]",a,perl = TRUE,value = TRUE) # as  Pierre suggested
b
[1] "c.2455C>T" "c.2455C>G"
ADD COMMENT
0
Entering edit mode

Apologies for not being clear. There can be other mutation types also like G>T, G>C,T>A... etc.

ADD REPLY
0
Entering edit mode
2.8 years ago

Assuming you want just the mutation type part, you can strip the string with sed -E 's|.+[0-9_]+||g'

$ echo -e "c.2455C>T\nc.234A>G\nc.6345TTT>G\nc.4375_4376insACCT\nc.4375_4379del"
c.2455C>T
c.234A>G
c.6345TTT>G
c.4375_4376insACCT
c.4375_4379del
$ echo -e "c.2455C>T\nc.234A>G\nc.6345TTT>G\nc.4375_4376insACCT\nc.4375_4379del" | sed -E 's|.+[0-9_]+||g'
C>T
A>G
TTT>G
insACCT
del

Note: If your input is any HGVS cDNA mutation, you may get del,ins,delins etc. and not just substitutions (e.g. A>T).

In R just use gsub

> gsub("c.4375_4376insACCT", pattern=".+[0-9_]+", replacement="")
[1] "insACCT"
ADD COMMENT

Login before adding your answer.

Traffic: 1739 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6