Question: Shifting from microarray to RNAseq
0
gravatar for ag1805x
12 months ago by
ag1805x110
India
ag1805x110 wrote:

My work involves identification of diferentially expressed genes within 3 conditions using microarray data. I had planned to use existing data from GEO for this purpose. For microarray, in most datasets all the 3 conditions were present in the same experiment and was pretty good to use for my purpose. Now I read that RNA-seq has an advantage over microarray for the detection of DEG. But all the 3 conditions are rarely present within the same experiment. An experiment has 2 of 3 conditions. The experiments vary over instrument and library preparation methods. Can anyone suggest on how can I plan my further work if I want to compare between microarray data and RNAseq DEG or rather shift entirely to RNAseq. Also for the most papers I referred I saw the group uses their own samples. Prepare all the conditions themselves and sequence it. Is it a good idea to work with existing RNAseq data?

rna-seq microarray ngs • 435 views
ADD COMMENTlink modified 12 months ago by Friederike2.3k • written 12 months ago by ag1805x110
1
gravatar for Kevin Blighe
12 months ago by
Kevin Blighe33k
Republic of Ireland
Kevin Blighe33k wrote:

Hi,

It's not a major issue if the RNA-seq samples differ based on library prep and instrument. Provided that you can access the raw FASTQ files, I would do the following:

  1. Obtain raw counts in each sample using Kallisto
  2. Import counts to DESeq2 using tximport
  3. Normalise counts in DESeq2 with a design model that includes library prep and instrument as factors likely to affect counts. This should mitigate the effects of these, if they exist.

There is a tutorial for this general process here: Analyzing RNA-seq data with DESeq2

ADD COMMENTlink written 12 months ago by Kevin Blighe33k
1

It's not a major issue if the RNA-seq samples differ based on library prep and instrument

I don't think I agree with this. Library prep is likely to have a big impact, especially if you compare ribo depletion with polyA sequencing. But even within the same strategy I expect biases.

Including this in the design model might work, but I bet that the biological subgroups will be confounded / not independent from these technical subgroups.

But if you some data to back your answer, please share.

ADD REPLYlink written 12 months ago by WouterDeCoster35k
1

True, but the OP should therefore elaborate on the specific library prep methods in the RNA-seq samples of interest. I made an assumption that these library prep methods were just different versions of the same kit and/or were targeting the same RNA species. I made this assumption because I had initially assumed that it was obvious that library prep methods targeting different RNA species would not be compatible.

ADD REPLYlink written 12 months ago by Kevin Blighe33k

I do have valid data that shows how the inclusion of sequencer instrument, library prep method (all ribosome depletion), and read type (single/paired-ends) in the design model can remove these effects.

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe33k

Dear Kevin,

I did an initial attempt with HiSat2, StringTie and Ballgown. StringTie gives values as FPKM or TPM. If I understand these are normalised values. Do I need to do any further normalisation?

As for the library preparation I mean the various kits used like TruSeq, Universal kit etc.

Can you share some paper with this sort of work.

ADD REPLYlink modified 12 months ago • written 12 months ago by ag1805x110

To do this, my advice is to not use HISAT2 and to not use FPKM / TPM. If you follow my approach (above), your life will be a lot easier. Both FPKM and TPM normalistaion strategies have come under much criticism in recent years and many avoid them. They are certainly not ideal for what you wan to do.

ADD REPLYlink written 12 months ago by Kevin Blighe33k
1
gravatar for Friederike
12 months ago by
Friederike2.3k
United States
Friederike2.3k wrote:

all the 3 conditions were present in the same experiment and was pretty good to use for my purpose

I'm not sure I understand what you're looking for. Did the microarray data help you address the biological question at hand or not?

Now I read that RNA-seq has an advantage over microarray for the detection of DEG

That depends on the biological question you're interested in. Do you want to detect novel transcripts? Alternative splicing? Very lowly expressed genes? Then yes, probably you're better off using high-quality (!) RNA-seq data.

If you just want to get a list of genes with 2fold expression change in different conditions then microarrays should serve you well if you follow the proper processing and analysis procedures.

Is it a good idea to work with existing RNAseq data?

If it addresses your biological question and meets basic QC criteria, sure.

ADD COMMENTlink written 12 months ago by Friederike2.3k
  • Well yes, the micro-array data did address my biological question.
  • I need to detect differentially expressed genes and further plan to create networks. For now I donot intend to detect any novel transcripts or alternate splicing.

  • Actually I need to know can I compare data from condition 1 in experiment 1 with data from condition 2 in experiment 2 with data from condition 3 in experiment 3. I was thinking about processing the individual data sets separately and get the expression values as FPKM/TPM; since this gives a quantitative measure . Is this the correct approach? I could not find any related papers that have taken this road.

ADD REPLYlink written 12 months ago by ag1805x110
2

I need to detect differentially expressed genes and further plan to create networks. For now I donot intend to detect any novel transcripts or alternate splicing.

Then absolutely do not use HISAT2. You will struggle to normalise these samples together and deal with the unusual experimental setup because neither FPKM nor TPM even deal with cross-sample differences when normalising. Just derive raw counts with Kallisto or Salmon, and then process these in DESeq2 with instrument and library prep method as covariates in your DESeq2 design model. I gave the entire tutorial to you through a link in my answer (above).

ADD REPLYlink written 12 months ago by Kevin Blighe33k

Can you suggest some papers with similar work?

ADD REPLYlink written 12 months ago by ag1805x110
2

There are undoubtedly already countless that are already published that have utilised Kallisto and DESeq2, and with varying covariates in the design models. Just use a search engine with

ncbi rna-seq kallisto deseq2

My own big manuscript will soon be coming out, too, where we utilised these.

If you follow the tutorial, which was written by some of the DESeq2 authors, and isolate the relevant parts, then you cannot really go wrong.

ADD REPLYlink modified 12 months ago • written 12 months ago by Kevin Blighe33k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1665 users visited in the last hour