Question: Extract specific patter from string in MySQL:
0
gravatar for Renesh
4.5 years ago by
Renesh1.6k
United States
Renesh1.6k wrote:

I would like to extract specific pattern from string in MySQL.

The column contains specific string like xxx-atg168d and xxx-atg444-6x. From these string, I want to extract atg168 and atg444 only. How can I perform this in MySQL?

Input column
xxx-atg168d
xxx-atg444-6x
xxx-atg1689d
xxx-atg16507d

Output column
atg168
atg444
atg1689
atg16507

 

mysql • 16k views
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Renesh1.6k
1

http://dev.mysql.com/doc/refman/5.7/en/string-functions.html

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Dan D6.7k

It's really unclear how this is related to bioinformatics.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

Yes this related to bioinformatics. This table mimics the miRNA data.

ADD REPLYlink written 4.5 years ago by Renesh1.6k

This is plain text processing and SQL. Not really related to Bioinformatics and like Devon says, can be found with the tiniest bit of googling.

ADD REPLYlink written 4.5 years ago by RamRS20k
3
gravatar for Pierre Lindenbaum
4.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

If the pattern is always "-atg", then the following sql statement should work:

select SUBSTRING_INDEX(SUBSTRING(name FROM IF(locate("-atg",name)=0,1,1+LOCATE("-atg",name))),"-",1) from mydatabase order by name;
ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Pierre Lindenbaum118k

Thanks for you reply. But, the all record does not have atg pattern. How can i generalize this for all different patterns?

ADD REPLYlink written 4.5 years ago by Renesh1.6k
2

It might be easiest to just write a special function to use substring() or substring_index() and also then run instr() to determine if this should be done a second time.

Alternatively, just use a different language that can interface with mysql (perl, python, etc.). mysql is great for storing and querying data and bad for processing it in a complex way.
 

ADD REPLYlink written 4.5 years ago by Devon Ryan88k
2

FYI, when you're asking questions on sites like these, you should specify exactly what you want. You're getting free help from the best bioinformaticians in the world. Pierre is in the top .1% in his field. He answered your question as stated brilliantly, but you were lazy and didn't properly state your needs.

Otherwise you waste the time of yourself and the busy brilliant minds who are trying to answer your question. If you goof up and end up needing to change the parameters of the question, the decent thing is to edit your original question and apologize for wasting others' time.

We're nice. At the StackExchange network they'll downvote you to the Earth's core.

How To Ask Good Questions On Technical And Scientific Forums

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by Dan D6.7k
1

If I could up-vote this more than once I would.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

I still wonder why we don't have a downvote option. Maybe mods should start adding "not-bioinformatics" tags so experts need not waste their time on such questions.

ADD REPLYlink written 4.5 years ago by RamRS20k

Yeah, we tend to just close the ones that are too off-topic. It's always a difficult to find the right threshold to close things though.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

Actually, Devon, I was thinking new folks like myself could take up these questions so experts need not waste their time on them. Do you think that makes sense?

ADD REPLYlink modified 4.5 years ago • written 4.5 years ago by RamRS20k

The really off-topic ones should just get closed. But otherwise please feel encouraged to field any questions you come upon :) I can't speak for the others but part of my personal impetus for answering questions here is that I often run into a similar problem later and will then already know the solution (not to mention that everyone else's problems always seem so much more tractable than mine!).

BTW, since you're on twitter, do follow @BiostarQuestion. I find it convenient to just look at the twitter feed when something is running or compiling.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

I already follow @BiostarQuestion :) I agree that the off topic questions should ideally get closed, but who decides where the line is? Case in point, OP insists this is relevant because the data happens to follow, roughly, the format of some random IDs. Pretty sure any bioinformatician that has any idea of RegEx (so, all of us) can solve said question with simpler tools. Or, if one is Excel savvy, one knows that FIND() and MID()/SUBSTR() can be used to achieve almost anything.

 

You know what, maybe we should ask for a RegEx captcha before anyone can post a question :-D

ADD REPLYlink written 4.5 years ago by RamRS20k

Yeah, there's no clear line and sometimes we disagree how off-topic things need to be before we close them.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k
1

yes but that will be tricky. If you the owner of the database, you should create a new field containing the new name (dump the database, transform with sed, update the records)

ADD REPLYlink written 4.5 years ago by Pierre Lindenbaum118k
2
gravatar for Devon Ryan
4.5 years ago by
Devon Ryan88k
Freiburg, Germany
Devon Ryan88k wrote:

FYI, we shouldn't have to assume/figure out how questions are relevant (these could be any IDs and this is actually just a mysql question), that's your responsibility.

In mysql, you'd just use the substring command select substring(some_column, 4,) from some_table. If you want to remove suffixes too, then you'd need to add substring_index(). BTW, you could have found this with a small bit of googling.

ADD COMMENTlink written 4.5 years ago by Devon Ryan88k

Devon, can you please, stop replying here while I'm looking for the correct sql statement ? :-)
 

ADD REPLYlink written 4.5 years ago by Pierre Lindenbaum118k

I'll see what I can do :)

ADD REPLYlink written 4.5 years ago by Devon Ryan88k
1
gravatar for Renesh
4.5 years ago by
Renesh1.6k
United States
Renesh1.6k wrote:

Hi, I use following two steps to solve my problem and it works. Thanks for your help,

substring_index (substr(cn,instr(cn,'-')+1),'-',1);

CONCAT(LEFT(cn, CHAR_LENGTH(cn) - 1),IF(RIGHT(cn, 1) REGEXP '[a-z]' = 0, RIGHT(cn, 1), ''));

 

http://stackoverflow.com/questions/26045071/extract-specific-patter-from-string-in-mysql

http://stackoverflow.com/questions/15776957/mysql-remove-last-characters-is-character-if-its-number

ADD COMMENTlink modified 4.5 years ago • written 4.5 years ago by Renesh1.6k
1

FYI, it's not unheard of to see miRNAs with names like mmu-miR123-3-3p, which that method will miss (perhaps you don't have any of those in this dataset, but keep that in mind for the future). In general it's probably better to just use a regex with perl/python/etc or standard string splitting in a normal (i.e., not SQL) language.

ADD REPLYlink written 4.5 years ago by Devon Ryan88k

Yes you are right. I also missed data like cbn-miR-44. Therefore, the method i proposed is not generalized for all miRNAs 

ADD REPLYlink written 4.5 years ago by Renesh1.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1217 users visited in the last hour