I would like to ask several questions regarding how to compare 2 genomes in order to find differences: Assume I have 2 dataset of sequencing data from 2 plants of a same species (e.g. arabidopsis) - 1 plant has normal phenotype, the other has disease phenotype. Theoretically, the disease phenotype is known to be controlled by a single gene, and these 2 plants should have similar genome accept the region that responsible for the phenotype. I would want to somehow compare the 2 genomes to find out differences between 2 plants (in order to find the disease gene).

I'm a newbie in Bioinformatics (also newbie of Biostars), I do not know where to start. Would you mind providing me some guides to help me find an approach for my problem? - Projects or publications that have similar object; Documents, internet or books, that I should read; Maybe a suggested pipeline would be great...

Hi nnhung232, I assume you have a mapping population in which your disease phenotype is segregating and your objective is to genetically map the resistant locus. If so then you may try mapping-by-sequencing/SHOREmap, NGM or Mutmap tools. The fundamental principle behind all these tools are similar, which is Bulk sergeant analysis (BSA) by sequencing. You can find several reviews about this new approach.

If my previous assumptions are correct (you pooled and sequenced plants with extreme phenotype(resistant and susceptible) from a mapping population), irrespective of the pipeline/tool of choice, you have to perform following steps: 1) Aligning both datasets separately to a common reference genome (from your example, its TAIR10) 2) Call SNPs 3) Filter polymorphic SNPs in your dataset. Based on your exp. setup, you may have to apply different filtering strategy 4) Plot the allele frequency/SNP index of filtered SNPs to identify mapping interval.

First place you should be looking at is "aligners". Try out few sequence aligners. There are many of them out there. progressiveMauve, MUMmer, mVISTA, MUSCLE, etc.

