Question: Compara API of Ensembl
0
gravatar for jrxu.bioinf
14 months ago by
jrxu.bioinf20
United States
jrxu.bioinf20 wrote:

Hello,

I just started learning the compara API. However, I am still not sure whether it can address my questions. I am wondering if someone could give me some guidance and example scripts.

Here is my question: (1) I want to identify in human genome all the DNA fragments that are significantly similar (homology, by lastZ or BlastZ). (2) Then, I want to find in which of the other species, two homology DNA fragments of human are significantly similar (aligned) to one genomic region in that species.

Alternatively, I can focus on two genomic regions in a genome to test if they are homologous and then which species has one genomic region that is aligned to both of the human genomic regions.

Particularly, I am wondering in the human self alignment, one genomic region may be mapped to multiple other regions. These multiple hits also exist in e.g. the mouse genome of the human vs mouse genome alignment. Does ensembl provide all these multiple regions or just the best one?

Any scripts that can achieve my goals? My compara API version is 95.

Thanks!

alignment genome • 353 views
ADD COMMENTlink modified 14 months ago • written 14 months ago by jrxu.bioinf20

Based on API documentation, you should be able to retrieve human self-alignments and cross-spaces alignments for the regions you'd like to query. Compara works on top of aligned sequences, so if you think the alignments are not accurate or exhaustive, you may need to re-do the alignments using tools of your choice.

ADD REPLYlink written 14 months ago by Vitis2.3k
0
gravatar for Vitis
14 months ago by
Vitis2.3k
New York
Vitis2.3k wrote:

Paralogs and orthologs are not defined as repetitive regions within or across genomes, but defined by evolutionary relationships, or, phylogeny. Orthologs are genes duplicated along with speciation event, paralogs are genes duplicated within species. Beyond genes, genomic sequences diverge very fast following any duplication events, as they are constraint by little selective force. As a result, evolutionary relationships for non-genic sequences are very hard to infer, therefore, paralogous relationships outside of genes are very hard to investigate. If you read the documentation on Ensembl, compara is largely about genes. They built evolutionary relationships (paralogous or orthologous) based on comparisons between gene trees and species trees. In theory, we can apply this approach to any sequences, as long as we can establish a sequence phylogeny and compare it with species phylogeny. But I don't think compara does this.

ADD COMMENTlink written 14 months ago by Vitis2.3k

Sorry for the confusion and misuse of terminologies. I edited my questions to focus on the technical issues.

ADD REPLYlink written 14 months ago by jrxu.bioinf20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2134 users visited in the last hour