Question

Shifting from microarray to RNAseq

0

Entering edit mode

6.4 years ago

Arindam Ghosh ▴ 510

My work involves identification of diferentially expressed genes within 3 conditions using microarray data. I had planned to use existing data from GEO for this purpose. For microarray, in most datasets all the 3 conditions were present in the same experiment and was pretty good to use for my purpose. Now I read that RNA-seq has an advantage over microarray for the detection of DEG. But all the 3 conditions are rarely present within the same experiment. An experiment has 2 of 3 conditions. The experiments vary over instrument and library preparation methods. Can anyone suggest on how can I plan my further work if I want to compare between microarray data and RNAseq DEG or rather shift entirely to RNAseq. Also for the most papers I referred I saw the group uses their own samples. Prepare all the conditions themselves and sequence it. Is it a good idea to work with existing RNAseq data?

RNA-Seq ngs microarray • 1.4k views

ADD COMMENT • link updated 6.4 years ago by Friederike 8.9k • written 6.4 years ago by Arindam Ghosh ▴ 510

score 2 · Answer 1 · 2017-12-12

2

Entering edit mode

6.4 years ago

Kevin Blighe 87k

Hi,

It's not a major issue if the RNA-seq samples differ based on library prep and instrument. Provided that you can access the raw FASTQ files, I would do the following:

Obtain raw counts in each sample using Kallisto
Import counts to DESeq2 using tximport
Normalise counts in DESeq2 with a design model that includes library prep and instrument as factors likely to affect counts. This should mitigate the effects of these, if they exist.

There is a tutorial for this general process here: Analyzing RNA-seq data with DESeq2

ADD COMMENT • link 6.4 years ago by Kevin Blighe 87k

1

Entering edit mode

It's not a major issue if the RNA-seq samples differ based on library prep and instrument

I don't think I agree with this. Library prep is likely to have a big impact, especially if you compare ribo depletion with polyA sequencing. But even within the same strategy I expect biases.

Including this in the design model might work, but I bet that the biological subgroups will be confounded / not independent from these technical subgroups.

But if you some data to back your answer, please share.

ADD REPLY • link 6.4 years ago by WouterDeCoster 47k

1

Entering edit mode

True, but the OP should therefore elaborate on the specific library prep methods in the RNA-seq samples of interest. I made an assumption that these library prep methods were just different versions of the same kit and/or were targeting the same RNA species. I made this assumption because I had initially assumed that it was obvious that library prep methods targeting different RNA species would not be compatible.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

I do have valid data that shows how the inclusion of sequencer instrument, library prep method (all ribosome depletion), and read type (single/paired-ends) in the design model can remove these effects.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Dear Kevin,

I did an initial attempt with HiSat2, StringTie and Ballgown. StringTie gives values as FPKM or TPM. If I understand these are normalised values. Do I need to do any further normalisation?

As for the library preparation I mean the various kits used like TruSeq, Universal kit etc.

Can you share some paper with this sort of work.

ADD REPLY • link 6.4 years ago by Arindam Ghosh ▴ 510

0

Entering edit mode

To do this, my advice is to not use HISAT2 and to not use FPKM / TPM. If you follow my approach (above), your life will be a lot easier. Both FPKM and TPM normalistaion strategies have come under much criticism in recent years and many avoid them. They are certainly not ideal for what you wan to do.

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

score 1 · Answer 2 · 2017-12-12

1

Entering edit mode

6.4 years ago

Friederike 8.9k

all the 3 conditions were present in the same experiment and was pretty good to use for my purpose

I'm not sure I understand what you're looking for. Did the microarray data help you address the biological question at hand or not?

Now I read that RNA-seq has an advantage over microarray for the detection of DEG

That depends on the biological question you're interested in. Do you want to detect novel transcripts? Alternative splicing? Very lowly expressed genes? Then yes, probably you're better off using high-quality (!) RNA-seq data.

If you just want to get a list of genes with 2fold expression change in different conditions then microarrays should serve you well if you follow the proper processing and analysis procedures.

Is it a good idea to work with existing RNAseq data?

If it addresses your biological question and meets basic QC criteria, sure.

ADD COMMENT • link 6.4 years ago by Friederike 8.9k

0

Entering edit mode

Well yes, the micro-array data did address my biological question.
I need to detect differentially expressed genes and further plan to create networks. For now I donot intend to detect any novel transcripts or alternate splicing.
Actually I need to know can I compare data from condition 1 in experiment 1 with data from condition 2 in experiment 2 with data from condition 3 in experiment 3. I was thinking about processing the individual data sets separately and get the expression values as FPKM/TPM; since this gives a quantitative measure . Is this the correct approach? I could not find any related papers that have taken this road.

ADD REPLY • link 6.4 years ago by Arindam Ghosh ▴ 510

2

Entering edit mode

I need to detect differentially expressed genes and further plan to create networks. For now I donot intend to detect any novel transcripts or alternate splicing.

Then absolutely do not use HISAT2. You will struggle to normalise these samples together and deal with the unusual experimental setup because neither FPKM nor TPM even deal with cross-sample differences when normalising. Just derive raw counts with Kallisto or Salmon, and then process these in DESeq2 with instrument and library prep method as covariates in your DESeq2 design model. I gave the entire tutorial to you through a link in my answer (above).

ADD REPLY • link 6.4 years ago by Kevin Blighe 87k

0

Entering edit mode

Can you suggest some papers with similar work?

ADD REPLY • link 6.4 years ago by Arindam Ghosh ▴ 510

2

Entering edit mode

There are undoubtedly already countless that are already published that have utilised Kallisto and DESeq2, and with varying covariates in the design models. Just use a search engine with

ncbi rna-seq kallisto deseq2

If you follow the tutorial, which was written by some of the DESeq2 authors, and isolate the relevant parts, then you cannot really go wrong.

ADD REPLY • link 5.2 years ago by Kevin Blighe 87k