Question: bio-newbie / gene research / tools and methods
gravatar for wizofe
3 months ago by
London, U.K.
wizofe0 wrote:

hello, world!

Newbie in bioinformatics, with an assigned project but not enough background and course information to be able to go forward. I would be more than happy if somebody could guide me on the following (its a research on a prokaryotic gene/protein).

Conservation of this gene/protein across bacteria (both species that are phylogenetically close to Mtb and others). 
> Performed a Protein BLAST analysis of the gene rpfb to find orthologs of the protein in other species. Any other tools?

Didn't get expected results. I would like more phylogenetic.

Information on the function of this protein based on sequence analysis Using HHblits to identify homologues in other species?

Information on the likely cellular location of this protein, as evidenced by motifs in the sequence.

Which tools? Maybe UNIPROT?

Information on any operon that this gene is part of (including conservation of the operon across closely related species). What is the genomic context of this gene (what other genes surround it and are they likely to be part of the same network?)


Exploring how good is the current annotation of the start codon (could the protein start earlier or later?) – in bacteria this annotation is not always perfect.

Can you identify likely regulatory elements in the region between tatD (the gene preceding rpfB) and rpfB? Examples are the Shine Dalgarno sequence, the -10 and -35 elements etc

Can you identify binding sites of transcription factors in the promoter region of this gene? Predicting the binding sites using the algorithm of SignalP:

Additional information: Experiments probing the transcription start site (TSS) suggest the presence of at least two such sites, one at 1127876 and the other at 1127955 (note that the first TSS overlaps the end of the tatD gene – this is not a mistake!). This suggests that an RNA transcript may be expressed ahead of the coding region of the gene. Explore its sequence and structure, assuming that it starts at the first TSS and ends approximately at coordinate 1128002.

Transcriptomic data suggests the expression of an antisense RNA with coordinates: 1127876:1128036 (on the negative or reverse strand). Explore this transcript and its potential function

ADD COMMENTlink modified 10 weeks ago by Biostar ♦♦ 20 • written 3 months ago by wizofe0

What do you mean by "didn't get expected results?" What did you get? What do you think you should have gotten?

Many of your questions can be answered with BLAST, and by looking at the sequence in it's genomic context (download the genbank from NCBI and open it in a genome viewer like ARTEMIS).

Tools you may want to look at, in no particular order (I'm sure you can work out what they're for!)

  • SignalP
  • InterProScan
  • Softberry/Psortb
  • GO database
ADD REPLYlink written 3 months ago by jrj.healey3.8k

(download the genbank from NCBI and open it in a genome viewer like ARTEMIS)


ADD REPLYlink written 3 months ago by genomax46k

@jrj.healey: I appreciate your help, I am trying to setup a Blast+, Artemis and UNIPROT workflow locally. I just feel really lacking the biology background and I am on my baby bioinformatics steps and having all those lectures and courseworks that are super stressfull (mind, I am doing a part-time MSc. while working during the day :/)

It's just too overwhelming, to process and go from a time that I didn't know what a protein is six months ago, to a full studying mindset. As the comments were notes to myself, I meant that I got those aligned results from BLAST that didn't know what they mean, how to present them, or what can I get out of this information.

Thanks for leading me to the right direction!

ADD REPLYlink written 3 months ago by wizofe0

No problem, it’s a steep learning curve for sure.

There aren’t really any specific answers to your query though, as there are lots of ways you could go about it. My 2 suggestions to you would be to think about: the question you’re actually asking, I.e. “where does the protein go” is the biological question, but the actual bioinformatics question is “how to I detect tell-tale sequence motifs/patterns”

and to get help on interpreting data you generate - there’s no harm in doing analysis that maybe turns out to be not the ideal, so long as you identify that!

ADD REPLYlink written 3 months ago by jrj.healey3.8k

That make sense :)

I agree. The point is exactly that without assessing the biological question it doesn't make any sense to start using tools and doing alignments or BLAST'ing!

Happily enough, I just discovered in my Uni library the book Exploring Bioinformatics by Clair & Visick [2010], which is a project-based approach and it so helpful in giving me the theoretical questions to support my "small research"!

ADD REPLYlink written 3 months ago by wizofe0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 754 users visited in the last hour