Question: Protein of un-clear function
0
gravatar for jaqx008
7 months ago by
jaqx00870
jaqx00870 wrote:

Hello all, I am studying a group of small RNAs I believe are being generated from a particular spliced transcript. This transcript (below) as I saw in IGV are duplicated 3 times adjacent to each other. The sequences also have a pattern of repeats in them. Finding the function of this protein is highly relevant to discussing my result and I am somehow stuck. I have tried to blast the sequence against ncbi and what I am finding is not making much sense probably due to poor annotation. I am trying to see if there is anything else I can do to find the function of this protein. My organism is Branchiostoma floridae. Thanks

CTGGCACCACTCTTGTCAGCTGAACGCTGGGCATCCCGATCGTCTGTAGACGGTGCGAAGGTTACCCTCTTCCTGGCACCGGTCTTGTTAGCTGGGCGCTGTGCATCCCGGCCGTCTGTAGACTGTGCGGGGGTAGGCACCAGAGAGCTTTGACGGGGCAGGTTGACCGGAGCAGGTCGACCTGTAAGGAATACAAAAAGAATGCAAAACATTTCAAGCATTAGTTCTCTTTAGCTATGAGATGTCCTAGAAAATCAGGACAAGCAAACGCATTTTCACCTTTTTTTAGAAAGGATATTGACATTGCTGCAGCTAGGATTAGGAAAGACTCGTTCTCTATCAAAAGTTTAACGTTTCATGTGTTGTAGTAATCTGTGTAAGCCCCTCCCAACTTAGAAGCCGAAATACGAAATGGTACAGTACTAGTAGATCCTTTACTTGCATATATACATATAATGAGTAGTTCTGGTTCAATATTGATATATAATTTCAAAACAAAAGACAAATATTACACACTTCTTTTTTTAATTTTATTTTTTCATTCTTGCAAATAACGACCAGAATTTCTTTGACCAAAACCATTCTCACCTACAACACCTGCCGGTGATGCGGACTTTCCGGCCCTCCTGGCTTGTGGTGCGTCACCCATAGGTGCGCATGCGCCTGGCCCATTCAGGCTCTCGCGACTCTCTGGCTTCTTGTCGTAGACTCCGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCGTCGCAGACACCGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACCGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACCGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTTGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACAGTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCATCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACAGTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCATCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACTCCGACACTGGCCATGGTGTCATCTGTGCCAGGGCCACTCTGGTTGTCTGCAAAATAATGCAAAACATTTAACGTTAAATCATCATTTCTCTTTAGGCCTGGGTCACATTTCCAAGCCGGGGCCCGATCGGGATGTTTTAAGAAACGAGAAATCAAATTGTATACCAAGAAAAATACACAAAGTATGCCCTTGAATCTTATTTTGACATCTTGTGTATTTTGATGTCTTTTCTATTATTTGCTTTTCTCCCGATAGCTGCCCGGCCGGGCCCCTTTTTTTTAAATGTGACCTAAGCCTTAGCTATGAGGTGTCCTACAAATCAGGACACATTGTCACTTTTTTTAGAAAGTATATCGACATTGCTGCAGGAGTTCTAAAACAGTTTGGCTTAGGAAAGACTCATTCCATATTAAAAGTTTCATGTTTTATGTGTTGTAGTAATCTGTGTAAGCCCCTCTTATGTTGGAAGGCGAAATACGAAACGGTACAGTACCAGTAGATCCCTTGTTTGCATATATATATATGATTAGTAATTCTCGGTCAATATCAATACATGTTTTGAAAAGAAAAGTCATGTATAGCACACTTCATTCTATTTGAAACCTTTGTTTAACTTATTGCAAATTCCCAATCGTTTATCCCCAGGGCCCTTGCTCTGTTGAATCACAGTTAAGGCACTTTCACATCAACTATCGTATGACTTGTGTCTTACTCATCTTTACCAATATTGTATATATATATTTAAAGTCTGCAATTTGTGT
ADD COMMENTlink written 7 months ago by jaqx00870

Which blast did you use?

I'm not familiar with this reference genome but I used blastn and I got these hits:

XM_002613548.1 (Branchiostoma floridae hypothetical protein, mRNA)

XM_002613549.1 (Branchiostoma floridae hypothetical protein, mRNA)

XM_002613550.1 (Branchiostoma floridae hypothetical protein, mRNA)

which contains these proteins:

XP_002613594.1

XP_002613595.1

XP_002613596.1

Each contains 2 or 3 conserved domains:

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613594.1

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613595.1

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613596.1

(You can change the view from concise results to standard results to see more domains)

DNA polymerase III subunit gamma/tau

UV excision repair protein Rad23

SOG2: RAM signalling pathway protein

If you google each domain + name of the genome you can find more information

Branchiostoma floridae + DNA polymerase III subunit gamma/tau

One of the results is

http://www.pantherdb.org/panther/family.do?clsAccession=PTHR11669

On this link if you click on Branchiostoma floridae : 4 you can see all the genes in this reference genome that have this domain.

......

ADD REPLYlink modified 7 months ago • written 7 months ago by Fatima740

The alignments with CDD domains look false positive for me because only a partial (non-repeat) region is matched with the repeat region.

ADD REPLYlink written 7 months ago by fishgolden450

thank you @ Fatima for looking this up. I have also seen the DNA pol but When I look in other organisms it is not found making it really strange. I am looking through your searches now to see what I can make of it.

ADD REPLYlink written 7 months ago by jaqx00870

Have you performed wet experiments? I think standard way to investigate function of gene is check expression -> check translation -> check localization, knock down analysis etc..

ADD REPLYlink written 7 months ago by fishgolden450

Unfortunately, I am unable to perform any wet experiments on this as we do not have the animal models and the work is not particularly funded. We are just looking to use bioinformatics approach.

ADD REPLYlink written 7 months ago by jaqx00870

Hmm all blast hits I can see are hypothetical or predicted proteins... It's quite dangerous to proceed without evidence that the gene is really expressed and translated.

ADD REPLYlink written 7 months ago by fishgolden450

I am not so much worried about the expression since I can quantify the number of transcripts and use that to ascertain whether or not they are translated. I just want to predict the function based on what they do in other organisms.

ADD REPLYlink written 7 months ago by jaqx00870

At first, you have to make sure that the translated amino acid sequence is the same as you expected and then you can perform sensitive search like hhsuite and hmmer

ADD REPLYlink written 7 months ago by fishgolden450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1793 users visited in the last hour