Hi, I'm new to RNA-seq. I found some RNA-seq data in the ENCODE project. I tried to get a differential gene expression between control and shRNA sample. However, I checked the .tsv file in ENCODE dataset, I don't know how to convert this data frame to the matrix which Deseq2 can progress. Have anyone analysis differential gene expression with using this tsv file before? I noticed there is count number in this dataframe,so I think it is possible to use Deseq2 to analysis this data.
Here's the fo the tsv file
gene_id transcript_id.s. length effective_length expected_count TPM FPKM`gene_id transcript_id.s. length effective_length expected_count TPM FPKM
1 10904 10904 93 18 0 0 0 2 12954 12954 94 19 0 0 0 3 12956 12956 72 0 0 0 0 4 12958 12958 82 7 0 0 0 5 12960 12960 73 0 0 0 0 6 12962 12962 72 0 0 0 0 posterior_mean_count posterior_standard_deviation_of_count pme_TPM pme_FPKM 1 0 0 2.87 2.59 2 0 0 2.72 2.46 3 0 0 0.00 0.00 4 0 0 7.38 6.67 5 0 0 0.00 0.00 6 0 0 0.00 0.00 TPM_ci_lower_bound TPM_ci_upper_bound FPKM_ci_lower_bound FPKM_ci_upper_bound 1 3.09304e-05 8.69374 2.79450e-05 7.85469 2 5.01845e-06 8.05143 4.53004e-06 7.27558 3 0.00000e+00 0.00000 0.00000e+00 0.00000 4 1.84137e-04 21.95740 1.66494e-04 19.84690 5 0.00000e+00 0.00000 0.00000e+00 0.00000 6 0.00000e+00 0.00000 0.00000e+00 0.00000`
Thanks for your reply. As you suggested, I checked the ENCODE raw data. However, I found the WT group is singel-end sequencing but the treated group is pair-end sequencing. I wonder in this situation, it is possible to get the DEG analysis since their sequencing conducted in a different background? or they also deliver BAM file, should I start from this?
You can try and treat the PE as SE. The problem with ENCODE data is always that very little background information is present in terms of potential batch effects (have samples been processed together and in the exact same fashion, or maybe different technicians, protocols, sequencing machines etc.).