Hello everyone,
I'm completely new in bioinformatics.
I have to compare two strains of bacteria of a newly discovered species to find out why one is growing faster than the other. the two genomes are already sequenced and automatically annotated. I used BRIG, mummer and Mauve to align the genomes and found that they are closely related.
I then used blast to compare the proteins from the two species and find which genes were unique to each genome and I'm currently looking at the role of each of those genes in uniprot.
But now I find myself a bit lost because I have no idea on what I should do next . What should I do to know what gene is responsible for the difference of phenotype?
What should I look for?
Also is there some kind of course that explain the different stage of comparative genomic analyses?
Thank you all in advance for any help you can provide me.
Cheers
It's really hard to reach conclusion based on one comparison. If you had several bacteria with different growth rate you could use comparative tools more extensively. I think your research requires more experiments like gene expression or random mutations. Having that said I could recommend to look at prime suspects such as replication mechanisms, membrane synthesis, toxin-anti toxin systems and so on. Good luck
Thank you for the answer. Yes I'm am starting to realise that only one comparison seems not enought data. I'm sorry if my question seems trivial but how should I do to look at those prime suspect having only the genome ? It may be a stupid question for those used to bioinformatics but do I have to manually search in my annotated genome and look for each gene that maybe realated to prime suspect into uniprot and then do a blast with the sequence of those genes to know if it is specific to one strand ?
What is your organism? your best luck would be to find annotated orthologs in other relatives.
I am working on Bradyrhizobium with the reference strain being Bradyrhizobium elkanii USDA 76 wich is annotated on IGM. how finding orthologs with another strain may help me ?
I would recommend the RAST or PATRIC annotation, they have good orthology groups, that might bring a lot of annotation
How many genes/regions have you identified as being different between those two strains?
hello, I have found around 400 genes which are different between the two strains.
400 genes seems to be a rather tall number for differences in two strains of the same bacterial species.
Have you done due diligence in your assemblies/mauve analysis? How did you reach the conclusion about the differences? Are these genes/sequences really absent (i.e. can't be discovered by reciprocal-blasting)?
I think this high number of genes id due to the automated annotation that did not recognised all the genes so when i blasted protein against protein some of the genes were detected as uniq but weren't. I am currently working on this matter by trying to use different automated annotation and assembling the data.
Automated annotation is usually very good. You can tblastn a protein against the other genome DNA to make sure it's unique and annotation independant.
I did it for some of the protein and sometimes a protein predicted as uniq was in fact present in both genomes and when I searched in the the annotation for the gene it wasn't referenced . That's why I think the annotation did not match all the protein. My question is rather when you ask yourself: 'what is the difference between two genome that can cause a phenotypic difference' what thing do you have to look for ? I think for example SNP can cause this difference in phenotype but I can't look at every SNP in the genome .
Sure but neither can you use 400 genes (which would be about ~8-10% total genes?) either.
That is going to be a tough one to answer based on DNA sequence alone, if there are significant differences in sequence/organization/gene counts.