Find Disruptions Of Gene Structures Based On Gene Sequence Aligment?
Entering edit mode
11.4 years ago
Plantae ▴ 390

I have two closely related genomes, with all genes annotated,

Ortholog gene pairs were identified from these two genomes, now I want to find how many genes come with large effect structural variations, such as exon deletion, frameshift, premature codons etc.

I have aligned all ortholog gene pairs based on their gene sequence, the alignment can be view in detail to find whether their are exons that could not be aligned properly, however, their are too many gene pairs, It is impossible to view all these alignments.

Are their tools that can find structural variations based on long sequence alignments?

these days, most people used PE reads to find SVs, I could not find tools that can find SVs based on alignment of assembled contigs/scaffolds onto a reference genome.

structural variation genomics alignment • 2.2k views
Entering edit mode
11.4 years ago
Josh Herr 5.8k

I really think it will be hard to get away from actually manually inspecting all your alignments to see what is actually going on with your structural variants. I'm not aware of any structural variant prediction pipelines, but I would think a clustering algorithm might help you. I recommend USEARCH/UCLUST, but CD-HIT is also good, and there are others. Write a quick pipeline (which you can do in the shell) to cluster all your gene pairs, let it run overnight, and come back in the morning to a list of your consensus clusters for each gene pair. I would make sure you QC your sequences first to avoid any clustering errors derived from sequencing errors.


Login before adding your answer.

Traffic: 2090 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6