Why there are different hit for a same gene in different species
0
1
Entering edit mode
6.2 years ago
Farbod ★ 3.3k

Dear Friends, Hi ( I'm not native in English so, be ready for some possible language flaws)

(Please modify the title if it is not clear, and sorry if it is duplicated question)

In the search for some candidate genes in my fish de novo transcriptome assembly, I have downloaded my selected genes sequence (NCBI/Nucleotide section) from different fish species for each gene (why? because there was was several representative in the fish taxa) and create a collection.fasta file. Then I have used makeblastdb command with the -dbtype nucl to convert my assembly.fasta to a searchable database.

Then I have used BLASTN (evalue=1e-6) and used my collection.fasta as query and assembly.fasta as database.

In the blastn result, I have:

"TRINITY_DN110178_c1_g2_i5" map to "Acipenser ruthenus -> Androgen receptor" (gb|KF765735.1)

and

TRINITY_DN98381_c0_g1_i2" map to "Larimichthys crocea -> Androgen receptor" (gb|GU479047.1)

This is my question:

What is it telling us ?

We have two separate Transcripts(genes) for the same thing (here : androgen receptor). Why is that and what that show?

gene blast ortholog paralog • 1.4k views
1
Entering edit mode

NOTE: after this, I intend to run TBLASTX for genes that have not shown any hit from BLASTN.

1
Entering edit mode

I don't understand what's confusing here. You made a blast database containing orthologs, blasted against it and got orthologs. That's exactly what you would expect.

1
Entering edit mode

Dear Devon, hi and thank you.

My species is older (phylogenitically) from both two species that the blast have shown some hits.

And fishes has encountered several Whole Genome Duplication events (as my species is old, may be it has some WGD events less than Larimichthys crocea.

So, do these facts change anything in the situation you have described before ?

0
Entering edit mode

No, that changes nothing. Maybe the contigs are from a duplication, maybe they're not, we have no way of telling from what you posted.

1
Entering edit mode

from "Maybe the contigs are from a duplication" you mean that may be they are paralogs (instead of orthologs) ?

And, what extra information can help me in this regard (as you have mentioned: telling from what you posted )?

Thank you

1
Entering edit mode

Are the two contigs aligning to the same part of the androgen receptors or different parts? That's kind of the absolute first thing you need to look at. If they're mapping to the same (or at least largely overlapping) parts, then go back to the raw data and try to assess the quality of those contigs (e.g., with TransRate). If the assemblies look reasonable there, then perhaps you have paralogs (if it's interesting, please do confirm them).

1
Entering edit mode

Dear Devon, Hi and thank you for your help and sorry for my delay.

The assembly quality with Trinity pipeline and Transrate was acceptable.

In "Are the two contigs aligning to the same part of the AR or different parts?" How I can realize it? From Blast result? in the example below the "alignment length" and "start and end" has been shown; If it is the what you look for, would you please tell me your idea about that?

**query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score, subject description**

gi|686651354|gb|KF765735.1|-**Acipenser-ruthenus**-androgen-receptor-(**AR**)-mRNA,-complete-cds
,gnl|BL_ORD_ID|435564,98.04,2702,53,0,1,2702,665,3366,0,4697.15,TRINITY_DN110178_c1_g2_i5

gi|740529840|ref|NM_001303367.1|-**Larimichthys-crocea**-androgen-receptor-(**AR**),-mRNA
,gnl|BL_ORD_ID|92411,76.40,178,42,2,1418,1594,1615,1439,1.90148e-17,95.2994,TRINITY_DN98381_c0_g1_i2

1
Entering edit mode

Correct. Compare the start and end of the "hits". You may need to align the two hits themselves to see how they relate to each other. Then look at ar genes to see if there are more than one copy/known paralogs in other systems. This is the million-dollar annotation part for the $5k genome. ADD REPLY 1 Entering edit mode Dear genomax hi, There is a query start and end and Subject start and end. By "Compare the start and end of the hits", which one do you mean? and how to check for they relationship? do you mean their overlap? About " look at ar genes to see if there are more than one copy/known paralogs in other systems" Do you mean finding different paralogous in different organisms([here are some orthologs])? And about "the million-dollar annotation part for the$5k genome" I do not understand what did you mean by that.

1
Entering edit mode
1. The subject start/end.
2. Presumably in other fish.
3. That's another way of saying, "Sequencing is the cheap part in all of this."
1
Entering edit mode

Dear Devon, Hi. This is the situation for (1):

           species                  sstart    send

for  *Larimichthys crocea*       1615     1439

and

for  *Acipenser ruthenus*         665     3366


what can we conclude from it ?

0
Entering edit mode

That one is contained inside the other (though note the strand difference, which could be meaningful). The first contig probably aligns decently to the second.

1
Entering edit mode

As the transcripts (genes) that map to these two androgen receptor genes are separate/ totally different genes, does it tell us anything about the ortholog condition or something in this regard?

Thank you for all your helps.

0
Entering edit mode

"As the transcripts (genes) that map to these two androgen receptor genes are separate/ totally different genes, does it tell us anything about the ortholog condition or something in this regard?"

1
Entering edit mode

Dear Devon, Hi. In another manner:

Two sequence one belongs to a fish and one belongs to a bird but both of them related to AR gene, have mapped to one transcripts.

How can I compare these 3 sequences? using phylogeny trees ? and has this comparison any biological value ?

0
Entering edit mode

Looks like you have not run out of things to do with your one RNAseq dataset :-)

It is also possible that those two trinity sequences are really only one. Since we have no data to look at consider this a speculation.

1
Entering edit mode

Hi my friend, unfortunately no, I have not.

As there are many similar- cloned and useless paper in the field!

0
Entering edit mode

But are you not on a bit of thin ice here? Trying to do evolutionary analysis with predicted transcriptome/protein sequences.

1
Entering edit mode

maybe I could not understand what you mean.

Are you saying that I am in a wrong path ?

0
Entering edit mode

You are likely on the right path but how confident are you that these protein sequences are real/accurate.

1
Entering edit mode

They are produced from RNA-sequencing in paired-end strategy and then I have used Trinity software for assembly and then I have used BLAST algorithm for find homology with the GeneBank data and some of them have very low e-value. Som I guess they are (virtually) real/acurate :)