Question: How to use the data from SRA database?
1
gravatar for dz2353
7 days ago by
dz235340
dz235340 wrote:

Dear friends, Maybe the post title is not described accurately, but I do not know how to say it. I want to download some data from SRA database on NCBI as a control group to make a comparison with my treatment group so that I can do some downstream analysis like differently expressed genes analysis. But I do not know how to choose proper raw data. The first problem is that if I can compare the single-end data with paired-end data? If the size of raw data is different, can I compare them directly? Another question is that after getting the gene expression matrix, do I need to use the TMM method to eliminate the batch effect?

rna-seq next-gen • 91 views
ADD COMMENTlink modified 7 days ago by WouterDeCoster35k • written 7 days ago by dz235340
1

You can only remove batch effect between different experiments, if at least one group overlaps between the two experiments. As I read your design correctly you want to download controls... To compare with your treatment group... It sounds like you don't have at least one overlapping group in both experiments.

ADD REPLYlink written 7 days ago by b.nota5.4k

Maybe I need to make it more clear. My case is that I have three amniotic epithelial cell samples (AECs), and I want to find out the differently expressed genes between AECs and hESC. However, I do not have hESC's data, so I have to download some from the SRA database. Actually, I did find some hESCs from different projects. But the result of PCA is not good. hESC samples from the different projects can not cluster together. So I want to know how to figure out this issue. Thanks a lot!

ADD REPLYlink written 7 days ago by dz235340
3
gravatar for WouterDeCoster
7 days ago by
Belgium
WouterDeCoster35k wrote:

I want to download some data from SRA database on NCBI as a control group to make a comparison with my treatment group so that I can do some downstream analysis like differently expressed genes analysis.

You won't be able to see the difference between 'treatment and control' differential expression vs technical differences between the datasets. Differential expression analysis is only valid if you don't have technical confounders. Libraries should be made with the same kit, in the same lab, for the same sequencer; ideally by the same person.

The first problem is that if I can compare the single-end data with paired-end data?

That's not ideal, but it is not your only problem as described above.

If the size of raw data is different, can I compare them directly?

For differential expression analysis you should use a method such as edgeR or DESeq2 which will take care of normalizing the size of libraries.

Another question is that after getting the gene expression matrix, do I need to use the TMM method to eliminate the batch effect?

You cannot eliminate the batch effect since it is confounded by your treatment effect.

ADD COMMENTlink written 7 days ago by WouterDeCoster35k

Thanks for your detailed reply. I think I made a mistake. My case is that I have three amniotic epithelial cell samples (AECs), and I want to find out the differently expressed genes between AECs and hESC. Obviously, the relationship between AEC and hESC is not that as I mentioned before. I downloaded some hESC sequences from the SRA database. Before doing the DEG analysis, I did PCA analysis. But the result of PCA is not good. hESC samples from the different projects can not cluster together. So I think maybe somewhere I ignored and that is what I want to figure out. Thanks again.

ADD REPLYlink written 7 days ago by dz235340

You are comparing condition A with condition B, but your comparison between those conditions is confounded by technical differences. AEC vs hESC is the same problem as treatment vs control.

ADD REPLYlink written 7 days ago by WouterDeCoster35k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1676 users visited in the last hour