Question

Combining two rnaseq platforms in one

0

Entering edit mode

5.3 years ago

zizigolu ★ 4.3k

Hi I have raw read counts of two targeted rnaseq platforms, one targets 2256 probes and the other 1450 probes. They have 700 common genes in common. The chemistry of platforms is the same. The correlation of samples for 700 common genes is 80 percent. The same patients have been used for both platforms. I want to merge two platforms to have 3700 genes together. Could I simply take mean of raw counts of 700 common genes? Any suggestion please Thank you

RNA-Seq • 1.7k views

ADD COMMENT • link updated 5.3 years ago by johnsonnathant ▴ 120 • written 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

I want to merge two platforms to have 3700 genes together.

Probably important to state what the aim of your downstream analysis is.

ADD REPLY • link 5.3 years ago by WouterDeCoster 47k

0

Entering edit mode

Thank you, I found very similar differentially expressed genes in both platforms and read distribution for common differentially expressed genes also is the same. So, I thought to have a bigger datasets. I heard for each patient we have read counts of two platform. I guess it is unnecessary but I don't know how to combine these two data in a way not to loss information. My main aim would be making a predictive model to find prognostic genes from responder and non responder patients

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

score 3 · Answer 1 · 2019-01-15

3

Entering edit mode

5.3 years ago

johnsonnathant ▴ 120

There are several warning flags that come to mind based on the description of the data analysis especially when it comes to developing a predictive model for responder vs non-responder patients. There are many factors such as knowing the specifics on numbers, distribution of data, the similarity of probes, and even the disease model as genetics can have a factor. Correlation is an indicator of gene expression consistency but not a 'good' one since it could be misleading such as outliers.

1) Are the probes the same between platforms? - Seems the targeted platforms are not the same since only 700 are common 2) Pretty much guaranteed there is going to be a batch effect between the two platforms, so testing it is wise during the evaluation 3) Are there useful controls to look at variation within the datasets? 4) While the chemistry of platforms may be the same, sample prep and hands play a factor

These are just a few thoughts to help guide the analysis.

ADD COMMENT • link 5.3 years ago by johnsonnathant ▴ 120

0

Entering edit mode

Thanks a lot

Technician says library preparaion and every thing is the same only company names one platform Immune as only 1400 immune related genes being sequenced and another one oncology panel by 2556 genes. the distribution of same genes in matched samples is very similar and correlation between 2 matched samples is more than 80 percent. I am not sure how much I can trust the technician who is behind sample preparation but I want to merge these data to have 3000 genes as samples are the same

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Sorry, what could be controls to look at variation within the datasets?

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Sorry,

Company says that

because the objective is to make a multiplexing panel, all probes must have the same Tmn for an efficient hybridization on targets. This is to say that I will not be surprised that the 731 common probes have different sequence between 2 panels.

I am not sure what do they mean, does the sequence of probe for a common gene has been different for panels?

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

It sounds like from the description, you have is Nanostring data, not RNA-Seq? Just to clarify, as RNA-Seq (generally speaking) is probe free.

With regards to your comments:

"I want to merge these data to have 3000 genes as samples are the same"

whenever data is merged, need to always ask whether how the data was acquired is the same. This means are they confident that what they are calling a 'gene' between both setups are the same. Speaking from personal experience if it is nanostring, they should be as that is Nanostring marketing pitch. However, if it is a microarray, probes ie what they are target are known to be dirty (off target effects) "what could be controls to look at variation within the datasets?"

If it is Nanostring, they have built-in controls to address several different noises sources. Basically, there are two, biological and technical.

"does the sequence of probe for a common gene has been different for panels?"

To understand what is meant by the Tm and probe comment, you should read up on how PCR (polymerase chain reaction) works.

ADD REPLY • link 5.3 years ago by johnsonnathant ▴ 120

0

Entering edit mode

Those are not simple RNA-seq rather HTG EdgeSeq Oncology Biomarker and HTG EdgeSeq Precision Immuno-Oncology Panels. I have two samples as positive controls. Also likely melting temperature of 700 common probes in both panels has been the same also the sequence of 700 common probes roughly the same. Now, the question is, how I could use these positive control samples to check the similarity of expression of 700 common probes in both panels.

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Why not make this easy on yourself and find the HTG software that analyzes their data. We do this for their miRNA kits. I am sure there is some software that goes along with these panels. It is generally only available on the machine that runs the HTG instrument so you will need to find the core that did the samples and ask.

ADD REPLY • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

Sorry, You mean I should ask the company to combine the read counts from two platforms for me?

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

1

Entering edit mode

Assuming there is specific software to analyze those two panels you should complete the analysis independently and then look for gene overlaps. I am not sure what you have been asked to do exactly.

ADD REPLY • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

Right now I have used DESeq2 on panels separately and found differentially expressed genes and built model of them; Now, I have been asked to combine data from both panels to have a bigger data and more power

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

With HTG assays we have always used the software provided by HTG to do downstream data analysis.

Unless you know DESeq2 is appropriate for this purpose I suggest that you follow HTG's recommendations for data analysis. Clearly you are analyzing data for someone else. Were you asked to do the analysis this way or you are just doing this on your own?

ADD REPLY • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

No, this is a very expensive datasets that I am responsible for analysis. This is my job. I guess in one HTG paper I read people used DESeq2 so I used that. But I am not sure if I can remember the paper may be that has been an illusion :(

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

2

Entering edit mode

If you were not given specific instructions then my advice is to stop and check with HTG tech support if this is an appropriate way to proceed or if they recommend that you use their software. This will save you time and heartache.

Note: These custom assays may have controls and such built in that their software may take into account during analysis. You would not be able to ensure same standard using DESeq2 as if this was a normal RNAseq dataset.

ADD REPLY • link 5.3 years ago by GenoMax 141k

0

Entering edit mode

I can not thumb up genomax's comment enough

ADD REPLY • link 5.3 years ago by johnsonnathant ▴ 120

0

Entering edit mode

I have emailed the company; I hope they answer me clearly as I noticed they are very conservative in clearance when I asked about the similarity of probes for two platforms.

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k

0

Entering edit mode

Sorry,

Here I read people used DESeq2 for differential expression in HTG assay

https://www.htgmolecular.com/assets/htg/publications/2018_Myers_MicroRNA_Biomarkers_for_Parkinsons.pdf

Although this is on whole transcriptom HTG on miRNAs rather than targeted sequencing

ADD REPLY • link 5.3 years ago by zizigolu ★ 4.3k