RNA-Seq data from two sources
0
0
Entering edit mode
3.9 years ago

Good afternoon,

I am a RNA-Seq novice with a few questions, and any advice/help/recommended reading would be greatly appreciated.
Due to covid-19, and losing access to lab and field work resources, I have had to rethink my master's thesis project and create a desk-based study. My supervisor has RNA-Seq data from a time-series experiment (non-model organism, marine sponge species) and has said I might consider carrying out a descriptive analysis of the control samples (3 biological replicates) (de novo assembly, functional annotation...) for this species, which has not been done. We have since identified an SRA dataset for the same species (2 biological replicates) and thought to include the controls and perhaps assemble, annotate and carry out DE between the two populations (one is temperate and one is sub-polar).
The two datasets are both made up of 'whole' adult tissues and were kept in aquaria for a number of weeks prior to sequencing. However, in communicating with the PI from the other study I found that one of the samples was flash frozen and processed from frozen to aid in identifying cryoprotectant proteins. And the two datasets have different sequencing lengths (75bp and 125bp) and used different library prep kits and sequencing platforms.
I haven't seen any other examples of this done, and obviously have sneaking suspicion there is a reason for that. My supervisor's expertise does not fall in to this subject. If I follow this route, I have a few questions before going forward: 1) Can these datasets be assembled into a single transcriptome? Or are there steps to normalise this data to produce a single assembly for the two populations? 2) Would it be better to do one assembly per population compare them somehow? 3) Do I need to disregard the frozen sample? Then, likely not worth using the SRA data because of lack of replicates... 3) Should I just uncomplicate my life and stick to the one dataset?

Thanks in advance,

Dominique

RNA-Seq Assembly • 645 views
ADD COMMENT
0
Entering edit mode

two populations (one is temperate and one is sub-polar).

Is that referring to the two datasets? You have one type and SRA has another? If it is the same species you could try to assemble a single transcriptome by merging the two datasets but trying to do DE analysis across them may be problematic. Technical batch effects between the datasets may make it difficult to identify biological variation of interest.

ADD REPLY
0
Entering edit mode

Agree on this. One thing that you have to know about RNA-seq (in general every NGS experiment but especially RNA-seq) is that datasets suffer from notable so-called batch effects. That means differences on library prep day, the kit used for RNA extraction and library prep, the way technicians handle samples, sequencing regime, etc. confound (that is the important term) data. That means these batch effects induce technical variation between datasets that may mask, change, increase or eliminate the actual biological effect size. In RNA-seq this is especially important because RNA is prone to be degraded even by small contaminations with RNase which one can never fully avoid. I see modest batch effects in my data even for samples (low-input RNA-seq) which I produced all exactly the same way, but on different days. As you can imagine, comparing completely different studies in the same analysis is problematic as genomax says. It might be better to analyse them separately and then perform something like meta-analysis to see if the general findings hold true.

ADD REPLY
0
Entering edit mode

Thank you both for your informative replies. That is definitely what I was worried about. @genomax, yes the separate populations refer to the different datasets. If DE is off the table would there be any value in assembling a single transcriptome from the two datasets? @ATpoint that is very helpful. I will analyse them separately and look into meta-analysis for RNA-Seq data. Any recommendations? I have recently seen the new R package 'metaSeq' for RNA-Sep count data.

Thanks again :)

ADD REPLY

Login before adding your answer.

Traffic: 1621 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6