I make a resource for esimate the gene expresion level across many plant tissues using the RANSeq data . I have collected the dataset of different experimental samples from GEO and other sources. Now, Using HTSeq, I estimate the count for each sample (ie, samples from different experiment). Finally, I merge all the dataset to a single source, so that the expression level of a gene can be viewed across all samples (using heatmap of count data). But, I concern about the signifcance of my method. Could anyone tell about my strategy?
I have two specific doubt,
1. Is it significant to merge the data since the different experiment may have the 'batch effect'?
2. If it is ok to merge sample, I should consider the HTSeq count data or FPKM for the hheatmap?