Question: Find Disruptions Of Gene Structures Based On Gene Sequence Aligment?
gravatar for Plantae
7.6 years ago by
Plantae380 wrote:

I have two closely related genomes, with all genes annotated,

Ortholog gene pairs were identified from these two genomes, now I want to find how many genes come with large effect structural variations, such as exon deletion, frameshift, premature codons etc.

I have aligned all ortholog gene pairs based on their gene sequence, the alignment can be view in detail to find whether their are exons that could not be aligned properly, however, their are too many gene pairs, It is impossible to view all these alignments.

Are their tools that can find structural variations based on long sequence alignments?

these days, most people used PE reads to find SVs, I could not find tools that can find SVs based on alignment of assembled contigs/scaffolds onto a reference genome.

ADD COMMENTlink modified 7.6 years ago by Josh Herr5.7k • written 7.6 years ago by Plantae380
gravatar for Josh Herr
7.6 years ago by
Josh Herr5.7k
University of Nebraska
Josh Herr5.7k wrote:

I really think it will be hard to get away from actually manually inspecting all your alignments to see what is actually going on with your structural variants. I'm not aware of any structural variant prediction pipelines, but I would think a clustering algorithm might help you. I recommend USEARCH/UCLUST, but CD-HIT is also good, and there are others. Write a quick pipeline (which you can do in the shell) to cluster all your gene pairs, let it run overnight, and come back in the morning to a list of your consensus clusters for each gene pair. I would make sure you QC your sequences first to avoid any clustering errors derived from sequencing errors.

ADD COMMENTlink written 7.6 years ago by Josh Herr5.7k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1303 users visited in the last hour