22 months ago
Hello all, I am studying a group of small RNAs I believe are being generated from a particular spliced transcript. This transcript (below) as I saw in IGV are duplicated 3 times adjacent to each other. The sequences also have a pattern of repeats in them. Finding the function of this protein is highly relevant to discussing my result and I am somehow stuck. I have tried to blast the sequence against ncbi and what I am finding is not making much sense probably due to poor annotation. I am trying to see if there is anything else I can do to find the function of this protein. My organism is Branchiostoma floridae. Thanks

CTGGCACCACTCTTGTCAGCTGAACGCTGGGCATCCCGATCGTCTGTAGACGGTGCGAAGGTTACCCTCTTCCTGGCACCGGTCTTGTTAGCTGGGCGCTGTGCATCCCGGCCGTCTGTAGACTGTGCGGGGGTAGGCACCAGAGAGCTTTGACGGGGCAGGTTGACCGGAGCAGGTCGACCTGTAAGGAATACAAAAAGAATGCAAAACATTTCAAGCATTAGTTCTCTTTAGCTATGAGATGTCCTAGAAAATCAGGACAAGCAAACGCATTTTCACCTTTTTTTAGAAAGGATATTGACATTGCTGCAGCTAGGATTAGGAAAGACTCGTTCTCTATCAAAAGTTTAACGTTTCATGTGTTGTAGTAATCTGTGTAAGCCCCTCCCAACTTAGAAGCCGAAATACGAAATGGTACAGTACTAGTAGATCCTTTACTTGCATATATACATATAATGAGTAGTTCTGGTTCAATATTGATATATAATTTCAAAACAAAAGACAAATATTACACACTTCTTTTTTTAATTTTATTTTTTCATTCTTGCAAATAACGACCAGAATTTCTTTGACCAAAACCATTCTCACCTACAACACCTGCCGGTGATGCGGACTTTCCGGCCCTCCTGGCTTGTGGTGCGTCACCCATAGGTGCGCATGCGCCTGGCCCATTCAGGCTCTCGCGACTCTCTGGCTTCTTGTCGTAGACTCCGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCGTCGCAGACACCGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACCGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACCGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTTGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCTTGGTGTCGTCGCAGACAGGGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCTTGGTGTCGTCGCAGACACGGACACTGGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACAGTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCATCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACAGTGGCCATGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACAGGGACACTGGCCGTGGTGTCATCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCGTGGTGTCGTCGCAGACACGGACACTGGCCATGGTGTCGTCGCAGACTCCGACACTGGCCATGGTGTCATCTGTGCCAGGGCCACTCTGGTTGTCTGCAAAATAATGCAAAACATTTAACGTTAAATCATCATTTCTCTTTAGGCCTGGGTCACATTTCCAAGCCGGGGCCCGATCGGGATGTTTTAAGAAACGAGAAATCAAATTGTATACCAAGAAAAATACACAAAGTATGCCCTTGAATCTTATTTTGACATCTTGTGTATTTTGATGTCTTTTCTATTATTTGCTTTTCTCCCGATAGCTGCCCGGCCGGGCCCCTTTTTTTTAAATGTGACCTAAGCCTTAGCTATGAGGTGTCCTACAAATCAGGACACATTGTCACTTTTTTTAGAAAGTATATCGACATTGCTGCAGGAGTTCTAAAACAGTTTGGCTTAGGAAAGACTCATTCCATATTAAAAGTTTCATGTTTTATGTGTTGTAGTAATCTGTGTAAGCCCCTCTTATGTTGGAAGGCGAAATACGAAACGGTACAGTACCAGTAGATCCCTTGTTTGCATATATATATATGATTAGTAATTCTCGGTCAATATCAATACATGTTTTGAAAAGAAAAGTCATGTATAGCACACTTCATTCTATTTGAAACCTTTGTTTAACTTATTGCAAATTCCCAATCGTTTATCCCCAGGGCCCTTGCTCTGTTGAATCACAGTTAAGGCACTTTCACATCAACTATCGTATGACTTGTGTCTTACTCATCTTTACCAATATTGTATATATATATTTAAAGTCTGCAATTTGTGT

Which blast did you use?

I'm not familiar with this reference genome but I used blastn and I got these hits:

XM_002613548.1 (Branchiostoma floridae hypothetical protein, mRNA)

XM_002613549.1 (Branchiostoma floridae hypothetical protein, mRNA)

XM_002613550.1 (Branchiostoma floridae hypothetical protein, mRNA)

which contains these proteins:

XP_002613594.1

XP_002613595.1

XP_002613596.1

Each contains 2 or 3 conserved domains:

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613594.1

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613595.1

https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi?INPUT_TYPE=live&SEQUENCE=XP_002613596.1

(You can change the view from concise results to standard results to see more domains)

DNA polymerase III subunit gamma/tau

SOG2: RAM signalling pathway protein

Branchiostoma floridae + DNA polymerase III subunit gamma/tau

One of the results is

http://www.pantherdb.org/panther/family.do?clsAccession=PTHR11669

On this link if you click on Branchiostoma floridae : 4 you can see all the genes in this reference genome that have this domain.

The alignments with CDD domains look false positive for me because only a partial (non-repeat) region is matched with the repeat region.

thank you @ Fatima for looking this up. I have also seen the DNA pol but When I look in other organisms it is not found making it really strange. I am looking through your searches now to see what I can make of it.

Have you performed wet experiments? I think standard way to investigate function of gene is check expression -> check translation -> check localization, knock down analysis etc..

Unfortunately, I am unable to perform any wet experiments on this as we do not have the animal models and the work is not particularly funded. We are just looking to use bioinformatics approach.

Hmm all blast hits I can see are hypothetical or predicted proteins... It's quite dangerous to proceed without evidence that the gene is really expressed and translated.

I am not so much worried about the expression since I can quantify the number of transcripts and use that to ascertain whether or not they are translated. I just want to predict the function based on what they do in other organisms.

At first, you have to make sure that the translated amino acid sequence is the same as you expected and then you can perform sensitive search like hhsuite and hmmer