I have three questions about Rna-seq and datasets:
Is it fine to combine datasets? Suppose I am doing a project comparing control tongue epithelial tissue vs. tumor tongue epithelial tissue through DESEQ2 analysis. I have 5 control sra files from one experiment and 5 control sra files from another. Then I have 5 tumor sra files from another experiment and 5 tumor sra files from another. Is that fine since they are 10 control vs. 10 tumors or will it produce swayed results based on how the files were made?
My second question is what is the recommended amount of files to work with for rna-seq? I have heard that 10 control vs 10 tumor is ideal or 30 files in total, but what is the most recommendable as finding a dataset can be hard? I have also seen people doing work on geo datasets with over like 200 files or more. Is it more the merrier for better results or what?
This question kinda doesn't relate to the top 2, but there are MANY geo datasets without SRA. I find it hard to find datasets if it doesn't contain a SRA link. An example could be, GSE58911, which is perfect for what I'm looking for but does not have fq files which are pretty much necessary for a typical rna-seq pipeline. Am I doing something wrong or is there a way to use .txt files for a, suppose, Linux pipeline?
Sorry for the number of questions, but I've searched long and hard for answers and nothing has helped me yet