Forum:finding present/absent genes between case and control in de novo transcriptome assembly
0
1
Entering edit mode
8.4 years ago
Farbod ★ 3.4k

Dear friends, Hi (I'm not native in English so, be ready for some possible language flaws)

I have 3 healthy and 3 unhealthy fish transcriptome sequences (Illumina paired-end) and I have used Trinity software to de novo assemble them.

Now I want to collect those genes that are totally present in one condition (e.g healthy) and totally absent in the second condition (I have used DESeq2 for DEG analysis). but the problem is this that usually there is several isoforms for each transcripts in Trinity result and even when one isoform of one gene has absent/present manner, the other isoforms may have the same expression.

My final goal is to find transcripts or genes that I can use them as marker for health/illness determination using normal PCR (not qPCR).

Thank you in advance

Assembly gene sequence RNA-Seq • 2.0k views
ADD COMMENT
1
Entering edit mode

Were the transcriptomes from specific tissues, life cycle stage? Do the results of DE analysis look promising? While 3 replicates may have given you good DE results extending them for diagnostic purposes to a larger population (especially with a de novo transcriptome) will need significant additional sampling/work/luck.

ADD REPLY
1
Entering edit mode

Dear genomax2, Hi. Yes they are from similar tissues and from the same age individuals and I have another set of RNA-seq data produced from gonads of male and female for finding sex markers (unrelated to healthy/sick project). and the result was very good in the case of up or down regulation with strong biological concept, but not in the case of absent/present or 0/1 manner!

ADD REPLY
0
Entering edit mode

I assume there is no reference genome available, since you used Trinity? I have my doubts that such transcripts/genes (absent in one condition, present in the other) do exist in the same tissue (besides extreme conditions). I'm not saying you can't detect those, but probably just because of a very low abundance (and therefore by chance absent from one or the other).

ADD REPLY
1
Entering edit mode

Dear WouterDeCoster, Hi. nice to hear from you again! yes you are correct. It is de novo so there is no even close reference genome. The "multiple isoform" issue seems to be a world wide problem and have you heard something about "unigene"? is it related to this condition? because I have read several plant based article in plosONE that they used them for these kind of transcriptomes.

ADD REPLY
1
Entering edit mode

You should look at cd-hit-est to reduce the complexity of your trinity transcriptome. Also this thread has some interesting suggestions.

ADD REPLY
1
Entering edit mode

Hi, Thank you, I have used it before and yes it reduce the number of transcripts (and consequently the DEG, I guess) by performing some sort of clustering, e.g it put all isoforms with more than 90% of homology in one cluster with one representative transcriptome, but it seems that as the isoforms are really exist in the genome of that species, the primer designing could not get rid of them and because of that, the PCR result is not desirable. (I will look at your link)

ADD REPLY
0
Entering edit mode

So you want to make a PCR assay to binary classify fish between healthy and sick? I have my doubts (biologically) but wish you the most of luck with this. I guess your fish need to be very sick for this to have a chance. It will be possible to find expression differences (over/under-expression) but I wouldn't expect absence of a gene/transcript.

Perhaps you can find transcripts/genes which appear absent in one condition, but my guess it that this would just be because it's lowly abundant. Using PCR you would still amplify it.

ADD REPLY
1
Entering edit mode

Thank you for all your suggestion, I even have heard that some algorithms are used in the DEG packages to add some score instead of zero one ! and I have heard that it is related to Laplace's Rule of Succession or psudocount ! yes fishes are complicated and has experienced some additional round of whole genome duplication and the alternative splicing is another problem maker task here! but as I am searching to find something not very expensive that can be used in the aquaculture fields, a marker for simple PCR is my goal.

ADD REPLY
1
Entering edit mode

Well I'm working on human genetics and transcriptomics, so my knowledge of differential expression analysis after trinity assembly is... rather limited :-) Multiple isoforms per gene is just as it is in biology, it's an added complexity to your work, but it is natural. Unigene is a database containing transcripts, not sure what you mean with related to this condition.

ADD REPLY

Login before adding your answer.

Traffic: 1945 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6