Criteria for comparing genomes
0
0
Entering edit mode
5.1 years ago
zion22 ▴ 70

Hello I have a question for the more experienced, I am currently making a genomic comparison between my fungus and one of a paper (they are the same species), I would like to know what points I must take into consideration to determine whether my assembly, gene prediction, annotation and variant calling are correct (good quality) or not? thank you

assembly alignment SNP genome next-gen • 828 views
ADD COMMENT
0
Entering edit mode

Assembly quality is fairly straightforward, just use 'standard' assembly metrics such as N50, contig number, max contig length etc.

If the reference is a complete genome and yours is not, you may want to call variants (SNPs etc) in your sequences relative to the published reference by alignment (rather than assembly vs assembly). If there are SNP etc differences, you'll want to report the support for that variant (reads, qualities and so on).

For gene prediction, a good start would be that they have roughly the same number of predicted CDS, +/- some reasonable amount (this will depend on the species and genome size etc). You could do some ortholog clustering to identify genes in your sequence which are not present or substantially different from the published genome. You will need to speculate on whether this is due to sequencing errors (consult your quality values), or real differences.

I think thats about all we could say in broad terms. Others may have some ideas for metrics I've missed.

ADD REPLY
0
Entering edit mode

As mentioned by jrj.healey, it is good to compare assemblies with quantitative measures, like N50 or number of missassemblies. For this, you can use quast. You might want to assess the assemblies for qualitative measures, like gene set completeness, number of missing or duplicated genes. For this, you can use Busco.

ADD REPLY

Login before adding your answer.

Traffic: 1462 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6