Forum: finding present/absent genes between case and control in de novo transcriptome assembly
1
gravatar for Farbod
3.0 years ago by
Farbod3.3k
Toronto
Farbod3.3k wrote:

Dear friends , Hi ( I'm not native in English so, be ready for some possible language flaws)

I have 3 healthy and 3 un-healthy fish transcriptome sequences (illumina paired-end) and I have used Trinity software to de novo assemble them.

Now I want to collect those genes that are totally present in one condition (e.g healthy) and totally absent in the second condition (I have used DESeq2 for DEG analysis). but the problem is this that usually there is several isoforms for each transcripts in Trinity result and even when one isoform of one gene has absnt/present manner, the other isoforms may have the same expression.

My finall goal is to find transcripts or genes that I can use them as marker for health/illness determination using normal PCR (not qPCR).

Thank you in advance

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by Farbod3.3k
1

Were the transcriptomes from specific tissues, life cycle stage? Do the results of DE analysis look promising? While 3 replicates may have given you good DE results extending them for diagnostic purposes to a larger population (especially with a de novo transcriptome) will need significant additional sampling/work/luck.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by genomax69k
1

Dear genomax2, Hi. Yes they are from similar tissues and from the same age individuals and I have another set of RNA-seq data produced from gonads of male and female for finding sex markers (unrelated to healthy/sick project). and the result was very good in the case of up or down regulation with strong biological concept, but not in the case of absent/present or 0/1 manner!

ADD REPLYlink written 3.0 years ago by Farbod3.3k

I assume there is no reference genome available, since you used Trinity? I have my doubts that such transcripts/genes (absent in one condition, present in the other) do exist in the same tissue (besides extreme conditions). I'm not saying you can't detect those, but probably just because of a very low abundance (and therefore by chance absent from one or the other).

ADD REPLYlink written 3.0 years ago by WouterDeCoster40k
1

Dear WouterDeCoster, Hi. nice to hear from you again! yes you are correct. It is de novo so there is no even close reference genome. The "multiple isoform" issue seems to be a world wide problem and have you heard something about "unigene"? is it related to this condition? because I have read several plant based article in plosONE that they used them for these kind of transcriptomes.

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Farbod3.3k
1

You should look at cd-hit-est to reduce the complexity of your trinity transcriptome. Also this thread has some interesting suggestions.

ADD REPLYlink written 3.0 years ago by genomax69k
1

Hi, Thank you, I have used it before and yes it reduce the number of transcripts (and consequently the DEG, I guess) by performing some sort of clustering, e.g it put all isoforms with more than 90% of homology in one cluster with one representative transcriptome, but it seems that as the isoforms are really exist in the genome of that species, the primer designing could not get rid of them and because of that, the PCR result is not desirable. (I will look at your link)

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Farbod3.3k

So you want to make a PCR assay to binary classify fish between healthy and sick? I have my doubts (biologically) but wish you the most of luck with this. I guess your fish need to be very sick for this to have a chance. It will be possible to find expression differences (over/under-expression) but I wouldn't expect absence of a gene/transcript.

Perhaps you can find transcripts/genes which appear absent in one condition, but my guess it that this would just be because it's lowly abundant. Using PCR you would still amplify it.

ADD REPLYlink written 3.0 years ago by WouterDeCoster40k
1

Thank you for all your suggestion, I even have heard that some algorithms are used in the DEG packages to add some score instead of zero one ! and I have heard that it is related to Laplace's Rule of Succession or psudocount ! yes fishes are complicated and has experienced some additional round of whole genome duplication and the alternative splicing is another problem maker task here! but as I am searching to find something not very expensive that can be used in the aquaculture fields, a marker for simple PCR is my goal.

ADD REPLYlink written 3.0 years ago by Farbod3.3k
1

Well I'm working on human genetics and transcriptomics, so my knowledge of differential expression analysis after trinity assembly is... rather limited :-) Multiple isoforms per gene is just as it is in biology, it's an added complexity to your work, but it is natural. Unigene is a database containing transcripts, not sure what you mean with related to this condition.

ADD REPLYlink written 3.0 years ago by WouterDeCoster40k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1666 users visited in the last hour