Question

What's next differential expression?

0

Entering edit mode

7.6 years ago

germelcar ▴ 20

Hello everyone:

I am confused about what's next when I do a differential expression analysis. I made a DE analysis with one sample, two conditions and no replications (I know that I should have replications, but the experiment was done in that way).

I have the list of transcripts from the DE, but when I open both transcriptomes and look for the transcripts (say TRINITY_DN12605_c0_g1_i1), in the first transcriptome (the one of the first condition), the sequence is longer than the second transcriptome. How should I handle that in order to generate the proteome? I am doing it right? Which sequence to use (the one from the first condition or the second)?

Thanks in advance.

~g

DE Differential expression edgeR RNA-Seq • 2.1k views

ADD COMMENT • link updated 7.6 years ago by Farbod ★ 3.4k • written 7.6 years ago by germelcar ▴ 20

0

Entering edit mode

You need to ask yourself what your biological question is that you are trying to answer.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

I would like to know if exposing the specie to pathogens and forcing it to express more, it generates/activates some mechanisms of defense against the pathogens. Should I consider both transcripts (from normal and infected sample) or only from infected?

UPDATE

I was thinking about your answer and I think that only from the infected one, because is the one that is activating o expressing more due to the event in which the specie has been exposed (infected). Does that make sense?

Thanks.

ADD REPLY • link 7.6 years ago by germelcar ▴ 20

0

Entering edit mode

Since you used Trinity I think it's safe to assume your organism of interest doesn't have a reference genome? Why would you want to generate a "the proteome"?

Anyway, since you haven't used replicates I would suggest to first validate (e.g. qPCR) some differentially expressed genes (properly, with replicates) to make sure you are working with "true" biological differential expression and not just artefacts/random shifts in expression.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Thanks for reply WouterDeCoster.

And yes, I have assembled them with Trinity, but I don't have more replicates, I only have one sample from each condition. I want to generate the proteome in order to discover peptides.

Thanks.

ADD REPLY • link 7.6 years ago by germelcar ▴ 20

1

Entering edit mode

I want to generate the proteome in order to discover peptides.

That would be a double prediction right? You are assembling a putative transcriptome using trinity and then predicting peptides from that?

If discovering peptides is your aim why not do a proteomic discovery experiment instead of RNAseq?

Edit: I think you are using public RNAseq data (if this is related to one of your other posts) to do this exercise (so no actual experiment involved?).

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

Thanks for reply @genomax2

Yes, I am using public data, and in fact, I am reproducing the experiment exposed in the paper. Also, I am assemblying the transcriptome for later generate the proteome and finally, study the proteome (I think this is what you mean by doing proteomic discovery).

Finally I was able to run the scripts mentioned above.

Thanks.

ADD REPLY • link 7.6 years ago by germelcar ▴ 20

0

Entering edit mode

Proteomic discovery experiment would be taking extracted protein complement from this organism and identifying peptides/proteins from that prep (after proper purification/digestion etc) with mass spectrometry. Since you are not actually doing bench experiments that is not applicable. That would priovide direct evidence that those peptides were present in the proteins of the organism and are real.

ADD REPLY • link 7.6 years ago by GenoMax 141k

score 2 · Answer 1 · 2016-09-29

2

Entering edit mode

7.6 years ago

Farbod ★ 3.4k

ِDear germelcar, Hi.

In the trinity DE analysis outputs, there are two ".subset" files which contain the name of statistics of your healthy and pathogen-encountered DE transcripts according your threshold and the first column of them show the transcript IDs that are up-regulated in on of your conditions (subset). Then you can extract their sequences and undergo some annotation.

One easy way is to blast them against NCBI nr with outfmt 5 (xml) which then you can feed the result in the blast2GO for annotation and GO asignment and so on . . .

ADD COMMENT • link 7.6 years ago by Farbod ★ 3.4k

0

Entering edit mode

Thanks for reply Farbod.

I couldn't generate the .subset files because I couldn't generate the matrix, instead, I extracted the quantification table from each sample and merged them by transcripts' ID matching (with R merge function), later, I follow the steps mentioned in section "Identifying DE Features: No Biological Replicates (Proceed with Caution)", that is, the point 2.11 from edgeR's user guide.

When I try to run "abundance_estimates_to_matrix.pl" script, with salmon or kallisto as "--est_method", I got the following error:

Error in if (any(lib.size == 0L)) warning("Zero library size detected.") : missing value where TRUE/FALSE needed

I don't know if I did it in the correct way, but, I generated the de novo transcriptome for each sample and then, I ran the script "align_and_estimate_abundance.pl" using the sample's transcriptome as "--transcript" argument's value. Am I doing it in the right way?

Thanks.

ADD REPLY • link 7.6 years ago by germelcar ▴ 20

2

Entering edit mode

Hi germelcar,

nice to hear that you have done most of the part correctly,

please have a look here " Trinity Group"

ADD REPLY • link 7.6 years ago by Farbod ★ 3.4k

0

Entering edit mode

I have done it. Thanks to all of you for the help!

ADD REPLY • link 7.4 years ago by germelcar ▴ 20