How to find nucleoid-associated proteins in poorly described bacteria?
11 weeks ago
keki • 0

Hi everyone,

First of all, I've just started my PhD a few days ago and I have always been more of a wet-lab person, so I'm not really familiar with anything computer-related, but I'm eager to learn.

My project is based on determining genome conformation in different bacteria and its effect on certain phenotypes. My first task is to find chromosome-associated proteins (i.e. Hu proteins) which could be a target for mutations, mutate them in lab, and eventually, put into practice 3C-based techniques in order to decipher the genome architecture. However, I have no idea how I should start. My supervisor told me to do a quick search in the literature but my bacteria (Planctomycetes phylum in general - I need this data) are poorly described and I don't know if computational homologs of already known proteins (in other bacteria) on databases could do. What would be your first step for this task?

Sorry for such a basic question - I'm trying to train myself in computational biology, but it takes time and I'm a bit lost with computers at this time.

11 weeks ago
Asaf 9.4k

HU doesn't bind a specific DNA motif so you're out of luck here. I'm sure there's a protocol to sequence HU binding-sites that works in other species, maybe run this first and get the binding sites experimentally. You'll have a lot of computational work analyzing the results.

11 weeks ago
Mensur Dlakic ★ 21k

A general answer to your question is that you find known homologs or your protein of interest, and use them to search against the genome of interest. The execution of this search can be done several different ways. My assumption is that you have genomic sequence and know how to predict genes for it, or have that already available.

Finding homologs of your protein in related species can be done by literature search, or by visiting NCBI website and doing keyword search. Once you have several proteins, using BLASTp will compare them to the proteins in your organism.

Yet another way is to use hidden Markov models (HMMs), which are condensed mathematical representations of multiple sequence alignments for protein families. In principle they work the same as BLAST, but are more sensitive and rather than searching through multiple homologs one can use an HMM because it contains information about multiple proteins. One such database with many HMMs is Pfam. Below are search results for nucleoid against Pfam:

https://pfam.xfam.org/search/keyword?query=nucleoid

Not all the hits are going to be useful, and it is up to you to find relevant ones. This is something for you to research, but I will point you to HMMer as a tool that will handle the search using HMMs.

