Newbie in bioinformatics, with an assigned project but not enough background and course information to be able to go forward. I would be more than happy if somebody could guide me on the following (its a research on a prokaryotic gene/protein).
Conservation of this gene/protein across bacteria (both species that are phylogenetically close to Mtb and others). > Performed a Protein BLAST analysis of the gene rpfb to find orthologs of the protein in other species. Any other tools?
Didn't get expected results. I would like more phylogenetic.
Information on the function of this protein based on sequence analysis Using HHblits to identify homologues in other species?
Information on the likely cellular location of this protein, as evidenced by motifs in the sequence.
Which tools? Maybe UNIPROT?
Information on any operon that this gene is part of (including conservation of the operon across closely related species). What is the genomic context of this gene (what other genes surround it and are they likely to be part of the same network?)
Exploring how good is the current annotation of the start codon (could the protein start earlier or later?) – in bacteria this annotation is not always perfect.
Can you identify likely regulatory elements in the region between tatD (the gene preceding rpfB) and rpfB? Examples are the Shine Dalgarno sequence, the -10 and -35 elements etc
Can you identify binding sites of transcription factors in the promoter region of this gene? Predicting the binding sites using the algorithm of SignalP: http://www.cbs.dtu.dk/cgi-bin/webface2.fcgi?jobid=5A58DF6F00007FBF4248699A&wait=20
Additional information: Experiments probing the transcription start site (TSS) suggest the presence of at least two such sites, one at 1127876 and the other at 1127955 (note that the first TSS overlaps the end of the tatD gene – this is not a mistake!). This suggests that an RNA transcript may be expressed ahead of the coding region of the gene. Explore its sequence and structure, assuming that it starts at the first TSS and ends approximately at coordinate 1128002.
Transcriptomic data suggests the expression of an antisense RNA with coordinates: 1127876:1128036 (on the negative or reverse strand). Explore this transcript and its potential function