Question: Bioinformatics questions sequence
0
gravatar for anapaolavi
4 weeks ago by
anapaolavi0
anapaolavi0 wrote:
YKYRYLRHGKLRPFERDI
YKYRYLKHGKLRPFERDI
YKYRYLXHGKLRPFERDI
YKYRSLRHGKLRPFERDI
YKYRCLRHGKLRPFERDI
YKFRYLRHGKLRPFERDI
YKHRYLRHGKLRPFERDI
YKXRYLRHGKLRPFERDI
YLYRWVRRSKLNPYERDL
FYYRLFRHGKIKPYERDI
FFYRRFRHGKIKPYGRDL
FYYRLFRHGKIKPYGRDL
YYYRIWRSEKLRPFERDI
YYYRSHRKTKLKPFERDL
YFYRSHRSTKLKPFERDL
YFYRSHRSSKLKPFERDL
YYYRSSRKTKLKPFERDL
YYYRSYRKEKLKPFERDL

Write a regular expression that describes the alignment in the box.

Find 5 protein sequences from different organisms or strains that contain the pattern described by the regular expression from Q1. List the ID, name, size, source, and function of each protein.

Find 2 proteins with known structures that contain the pattern described by the regular expression from Q1. List the IDs of found protein structures.

Build a multiple sequence alignment for all protein sequences from Q2 and Q3.

Identify the conserved regions in the alignment from Q4 and explore their biological significance.

Evaluate statistical parameters of the regular expression from Q1 based on similar expressions in the Prosite database.

sequence • 171 views
ADD COMMENTlink modified 4 weeks ago by Mensur Dlakic9.0k • written 4 weeks ago by anapaolavi0

please change your title "Bioinformatics questions sequence". Of course it is a question about bioinformatics...

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum134k

looks like a homework. what have you tried so far ?

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum134k

not sure where to start

ADD REPLYlink written 4 weeks ago by anapaolavi0

The first question is asking to write a regular expression that captures those sequences. Depending on what language you are writing this in there will be regex tutorials that you should go through.

ADD REPLYlink written 4 weeks ago by rpolicastro3.9k

would this be correct? regex = ([A-Z])+

ADD REPLYlink written 4 weeks ago by anapaolavi0

yes but looks like it's a amino-acid alphabet (not A to Z) with a specific length...

ADD REPLYlink written 4 weeks ago by Pierre Lindenbaum134k

what would you recommend then?

ADD REPLYlink written 4 weeks ago by anapaolavi0

would this be correct? regex = ([A-Z])+

Well, yes, but it also covers the sequences A and AA and AAA and every other sequence of alphabetical uppercase characters that is conceivable (including all sequences that contain non-amino acid letters).

You need to find one that covers (exactly) the given alignment. So best to look at the individual columns of the alignment and see what amino acids they're composed of. This should then give you an idea of how to build the regex.

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by cschu1812.6k
2
gravatar for Mensur Dlakic
4 weeks ago by
Mensur Dlakic9.0k
USA
Mensur Dlakic9.0k wrote:

This is clearly a homework assignment, and you should ask your instructor for details. It beats the educational goals of your instructor if we show you exactly how to do this. That said, here are couple of hints.

I am guessing that a regular expression assignment is about individual columns in your alignment rather than a full set of sequences. For example, this is a regular expression of the last 4 columns in your alignment:

[EG]-R-D-[IL]

This means that the last column is either I of L, next to the last is always D, the one before it always R, and the one before it is either E or G. You should check with your instructor, but I think that your assignment is to find this pattern across all columns, and then search the database for proteins that match the pattern you found.

For example, here is one protein that matches the whole pattern (the match is in red):

MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNV TGFHTINHTFGNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFEL CDNPFFAVSKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLREFVFKNKDGFLYVYK GYQPIDVVRDLPSGFNTLKPIFKLPLGINITNFRAILTAFSPAQDIWGTSAAAYFVGYLKPTTFML KYDENGTITDAVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFN ATKFPSVYAWERKKISNCVADYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQ IAPGQTGVIADYNYKLPDDFMGCVLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSP DGKPCTPPALNCYWPLNDYGFYTTTGIGYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFN FNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEV AVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNNVFQTQAGCLIGAEHVDTSYECDIPIGAGICASY HTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVSMAKTSVDCNMYICG DSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQMYKTPTLKYFGGFNFSQILPDPL KPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDMIAAYTAA LVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTS TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYV TQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERN FTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVYDP LQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKY EQYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHY T

ADD COMMENTlink written 4 weeks ago by Mensur Dlakic9.0k

Would this be correct for the regular expression for the whole alignment? [YF][KLYF][YFHX]R[YWSCRLI][LYWFVHS][RKX][HRKS][GSTE]K[LI][RNK][P][FY][EG]RD[LI]

ADD REPLYlink written 4 weeks ago by anapaolavi0

I only checked the first three brackets, but those look good.

ADD REPLYlink written 4 weeks ago by rpolicastro3.9k

Thank you for checking! My next question is how to find 5 protein sequences from different organisms or strains that contain the pattern described by the regular expression that I provided above . I have to list the ID, name, size, source, and function of each protein. How can I do that?

ADD REPLYlink written 4 weeks ago by anapaolavi0

If you are doing this on the linux command line you can use grep with the regex. If you are using a programming language like Python or R they have functions to search strings using regex. Refer to the documentation for those languages for more information.

ADD REPLYlink written 4 weeks ago by rpolicastro3.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 991 users visited in the last hour
_