How do I identify gene duplication or paralogs in an annotated reference genome assembly
1
0
Entering edit mode
17 days ago

I generated a high quality de novo genome assembly which has been annotated using previously published isoseq data from the species. This genome is now the Refseq for the species. I am currently writing a manuscript detailing the assembly. My PI suggested that I add a quick biologically relevent analysis to the text to show the utility of my genome. Specifically, she said I should investigate if any of a list of candidate genes we have for a specific trait are duplicated or expanded.

To do this, she said I simply need to blast the sequence of my gene of interest , as annotated in this assembly, against the whole assembly. I have looked for papers which do this to reference, and in sifting through them have only succeeded in confusing myself.

As far as I understand, the specific method to do this is the same as this method to identify paralogs using blastp (I am unsure why i would use blastp to identify duplicated genes, but the papers I am reading all seem to agree on blastp instead of blastn):

https://ubwp.buffalo.edu/wnygirahcp/wp-content/uploads/sites/25/2014/05/Module-7.-Duplication-and-Degradation.pdf

where I

1) take the FASTA nucleotide sequence for my gene of interest (as determined by the annotation of my genome) and blast (do I use blastn? or blastp?) specifying the database as nr (nonredundant protein) and organism as my species of interest

2) Once I get my blast results back, my top hit will be that same gene I blasted

3) If any other results have a significant E value and score, those are potential paralogs/duplicated genes? How might I verify or validate that. or could I only say these are putative paralogs?

Am I missing anything? Is there a way to screen all annotated genes for duplication/paralogs that would make more sense then repeatedly blasting through the list? I am very lost as to how to proceed

blast duplication paralog genome • 280 views
ADD COMMENT
1
Entering edit mode
16 days ago
sansan_96 ▴ 80

Hello,

I think MCScanX could help you, there is a lot of information about it and it is easy to use:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3326336/

https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)

ADD COMMENT

Login before adding your answer.

Traffic: 1804 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6