Dear Friends, Hi ( I'm not native in English so, be ready for some possible language flaws)
(Please modify the title if it is not clear, and sorry if it is duplicated question)
In the search for some candidate genes in my fish de novo transcriptome assembly, I have downloaded my selected genes sequence (NCBI/Nucleotide section) from different fish species for each gene (why? because there was was several representative in the fish taxa) and create a collection.fasta file. Then I have used
makeblastdb command with the -
dbtype nucl to convert my assembly.fasta to a searchable database.
Then I have used
BLASTN (evalue=1e-6) and used my collection.fasta as query and assembly.fasta as database.
In the blastn result, I have:
"TRINITY_DN110178_c1_g2_i5" map to "Acipenser ruthenus -> Androgen receptor" (gb|KF765735.1)
TRINITY_DN98381_c0_g1_i2" map to "Larimichthys crocea -> Androgen receptor" (gb|GU479047.1)
This is my question:
What is it telling us ?
We have two separate Transcripts(genes) for the same thing (here : androgen receptor). Why is that and what that show?
NOTE: after this, I intend to run TBLASTX for genes that have not shown any hit from BLASTN.
I don't understand what's confusing here. You made a blast database containing orthologs, blasted against it and got orthologs. That's exactly what you would expect.
Dear Devon, hi and thank you.
My species is older (phylogenitically) from both two species that the blast have shown some hits.
And fishes has encountered several Whole Genome Duplication events (as my species is old, may be it has some WGD events less than Larimichthys crocea.
So, do these facts change anything in the situation you have described before ?
No, that changes nothing. Maybe the contigs are from a duplication, maybe they're not, we have no way of telling from what you posted.
from "Maybe the contigs are from a duplication" you mean that may be they are paralogs (instead of orthologs) ?
And, what extra information can help me in this regard (as you have mentioned: telling from what you posted )?
Are the two contigs aligning to the same part of the androgen receptors or different parts? That's kind of the absolute first thing you need to look at. If they're mapping to the same (or at least largely overlapping) parts, then go back to the raw data and try to assess the quality of those contigs (e.g., with TransRate). If the assemblies look reasonable there, then perhaps you have paralogs (if it's interesting, please do confirm them).
Dear Devon, Hi and thank you for your help and sorry for my delay.
The assembly quality with Trinity pipeline and Transrate was acceptable.
In "Are the two contigs aligning to the same part of the AR or different parts?" How I can realize it? From Blast result? in the example below the "alignment length" and "start and end" has been shown; If it is the what you look for, would you please tell me your idea about that?
Correct. Compare the start and end of the "hits". You may need to align the two hits themselves to see how they relate to each other. Then look at ar genes to see if there are more than one copy/known paralogs in other systems. This is the million-dollar annotation part for the $5k genome.
Dear genomax hi,
There is a query start and end and Subject start and end. By "Compare the start and end of the hits", which one do you mean? and how to check for they relationship? do you mean their overlap?
About " look at ar genes to see if there are more than one copy/known paralogs in other systems" Do you mean finding different paralogous in different organisms([here are some orthologs])?
And about "the million-dollar annotation part for the $5k genome" I do not understand what did you mean by that.
Dear Devon, Hi. This is the situation for (1):
what can we conclude from it ?
That one is contained inside the other (though note the strand difference, which could be meaningful). The first contig probably aligns decently to the second.
As the transcripts (genes) that map to these two androgen receptor genes are separate/ totally different genes, does it tell us anything about the ortholog condition or something in this regard?
Thank you for all your helps.
Dear @Devon Ryan, hi. Do you have any idea about this :
"As the transcripts (genes) that map to these two androgen receptor genes are separate/ totally different genes, does it tell us anything about the ortholog condition or something in this regard?"
Dear Devon, Hi. In another manner:
Two sequence one belongs to a fish and one belongs to a bird but both of them related to AR gene, have mapped to one transcripts.
How can I compare these 3 sequences? using phylogeny trees ? and has this comparison any biological value ?
Looks like you have not run out of things to do with your one RNAseq dataset :-)
It is also possible that those two trinity sequences are really only one. Since we have no data to look at consider this a speculation.
Hi my friend, unfortunately no, I have not.
As there are many similar- cloned and useless paper in the field!
But are you not on a bit of thin ice here? Trying to do evolutionary analysis with predicted transcriptome/protein sequences.
maybe I could not understand what you mean.
Are you saying that I am in a wrong path ?
You are likely on the right path but how confident are you that these protein sequences are real/accurate.
They are produced from RNA-sequencing in paired-end strategy and then I have used Trinity software for assembly and then I have used BLAST algorithm for find homology with the GeneBank data and some of them have very low e-value. Som I guess they are (virtually) real/acurate :)