Question: How to use the data from SRA database?
1
gravatar for dz2353
10 weeks ago by
dz235350
dz235350 wrote:

Dear friends, Maybe the post title is not described accurately, but I do not know how to say it. I want to download some data from SRA database on NCBI as a control group to make a comparison with my treatment group so that I can do some downstream analysis like differently expressed genes analysis. But I do not know how to choose proper raw data. The first problem is that if I can compare the single-end data with paired-end data? If the size of raw data is different, can I compare them directly? Another question is that after getting the gene expression matrix, do I need to use the TMM method to eliminate the batch effect?

rna-seq next-gen • 150 views
ADD COMMENTlink modified 10 weeks ago by WouterDeCoster36k • written 10 weeks ago by dz235350
1

You can only remove batch effect between different experiments, if at least one group overlaps between the two experiments. As I read your design correctly you want to download controls... To compare with your treatment group... It sounds like you don't have at least one overlapping group in both experiments.

ADD REPLYlink written 10 weeks ago by b.nota6.1k

Maybe I need to make it more clear. My case is that I have three amniotic epithelial cell samples (AECs), and I want to find out the differently expressed genes between AECs and hESC. However, I do not have hESC's data, so I have to download some from the SRA database. Actually, I did find some hESCs from different projects. But the result of PCA is not good. hESC samples from the different projects can not cluster together. So I want to know how to figure out this issue. Thanks a lot!

ADD REPLYlink written 10 weeks ago by dz235350
3
gravatar for WouterDeCoster
10 weeks ago by
Belgium
WouterDeCoster36k wrote:

I want to download some data from SRA database on NCBI as a control group to make a comparison with my treatment group so that I can do some downstream analysis like differently expressed genes analysis.

You won't be able to see the difference between 'treatment and control' differential expression vs technical differences between the datasets. Differential expression analysis is only valid if you don't have technical confounders. Libraries should be made with the same kit, in the same lab, for the same sequencer; ideally by the same person.

The first problem is that if I can compare the single-end data with paired-end data?

That's not ideal, but it is not your only problem as described above.

If the size of raw data is different, can I compare them directly?

For differential expression analysis you should use a method such as edgeR or DESeq2 which will take care of normalizing the size of libraries.

Another question is that after getting the gene expression matrix, do I need to use the TMM method to eliminate the batch effect?

You cannot eliminate the batch effect since it is confounded by your treatment effect.

ADD COMMENTlink written 10 weeks ago by WouterDeCoster36k

Thanks for your detailed reply. I think I made a mistake. My case is that I have three amniotic epithelial cell samples (AECs), and I want to find out the differently expressed genes between AECs and hESC. Obviously, the relationship between AEC and hESC is not that as I mentioned before. I downloaded some hESC sequences from the SRA database. Before doing the DEG analysis, I did PCA analysis. But the result of PCA is not good. hESC samples from the different projects can not cluster together. So I think maybe somewhere I ignored and that is what I want to figure out. Thanks again.

ADD REPLYlink written 10 weeks ago by dz235350

You are comparing condition A with condition B, but your comparison between those conditions is confounded by technical differences. AEC vs hESC is the same problem as treatment vs control.

ADD REPLYlink written 10 weeks ago by WouterDeCoster36k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1246 users visited in the last hour