Question

Deseq2 with using data from ENCODE project RNA-seq data

0

Entering edit mode

4.8 years ago

jrnick7 • 0

Hi, I'm new to RNA-seq. I found some RNA-seq data in the ENCODE project. I tried to get a differential gene expression between control and shRNA sample. However, I checked the .tsv file in ENCODE dataset, I don't know how to convert this data frame to the matrix which Deseq2 can progress. Have anyone analysis differential gene expression with using this tsv file before? I noticed there is count number in this dataframe,so I think it is possible to use Deseq2 to analysis this data.

Here's the fo the tsv file

  gene_id transcript_id.s. length effective_length expected_count TPM FPKM`gene_id transcript_id.s. length effective_length expected_count TPM FPKM

1 10904 10904 93 18 0 0 0 2 12954 12954 94 19 0 0 0 3 12956 12956 72 0 0 0 0 4 12958 12958 82 7 0 0 0 5 12960 12960 73 0 0 0 0 6 12962 12962 72 0 0 0 0 posterior_mean_count posterior_standard_deviation_of_count pme_TPM pme_FPKM 1 0 0 2.87 2.59 2 0 0 2.72 2.46 3 0 0 0.00 0.00 4 0 0 7.38 6.67 5 0 0 0.00 0.00 6 0 0 0.00 0.00 TPM_ci_lower_bound TPM_ci_upper_bound FPKM_ci_lower_bound FPKM_ci_upper_bound 1 3.09304e-05 8.69374 2.79450e-05 7.85469 2 5.01845e-06 8.05143 4.53004e-06 7.27558 3 0.00000e+00 0.00000 0.00000e+00 0.00000 4 1.84137e-04 21.95740 1.66494e-04 19.84690 5 0.00000e+00 0.00000 0.00000e+00 0.00000 6 0.00000e+00 0.00000 0.00000e+00 0.00000`

RNA-Seq • 1.4k views

ADD COMMENT • link updated 4.8 years ago by ATpoint 81k • written 4.8 years ago by jrnick7 • 0

score 2 · Accepted Answer · 2019-06-28

2

Entering edit mode

4.8 years ago

ATpoint 81k

None of the values can be fed into DESeq2 as it expects raw counts. Why => please read its paper and manual. If you want to use DESeq2 see if you get raw counts from ENCODE or download the fastq files and align (hisat2, star) or quanitfy against a transcriptome (salmon, kallisto). I would use salmon as it offers GC bias correction, deals with multimappers and is computationally inexpensive but fast. There is much good literature on RNA-seq analysis out there. Start with this and get a good background first.

ADD COMMENT • link 4.8 years ago by ATpoint 81k

0

Entering edit mode

Thanks for your reply. As you suggested, I checked the ENCODE raw data. However, I found the WT group is singel-end sequencing but the treated group is pair-end sequencing. I wonder in this situation, it is possible to get the DEG analysis since their sequencing conducted in a different background? or they also deliver BAM file, should I start from this?

ADD REPLY • link 4.8 years ago by jrnick7 • 0

0

Entering edit mode

You can try and treat the PE as SE. The problem with ENCODE data is always that very little background information is present in terms of potential batch effects (have samples been processed together and in the exact same fashion, or maybe different technicians, protocols, sequencing machines etc.).

ADD REPLY • link 4.8 years ago by ATpoint 81k