Question

Help creating Deseq2 count matrix from separate files

0

Entering edit mode

3.2 years ago

Nai ▴ 50

I have two separate matrix for normal and control. DeSeq2fromHTSeqcount , may i have to run separately for each matrix. I want to create GE.txt for further analysis. Please help me to input the data in Deseq2. In manual as it is saying treated...I did not get that point. list.files means I have mention each sample count file. Kindly explain it?

or Sample matrix • 3.6k views

ADD COMMENT • link updated 3.1 years ago by Ram 44k • written 3.2 years ago by Nai ▴ 50

0

Entering edit mode

Hi, can you try this very simple [but effective] advice: How to input data for DESeq2 from individual HTSeq count?

ADD REPLY • link 3.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin,

Thankyou. I have normal and cancer samples each folder has 100 - 100 patients which are having same ids. I want to calculate gene expression (i) Using matrix

Eg. Normal folder

all.sam.normal.count.mat-tsv

Cancer folder

all.sam.cancer. count.mat.tsv

May I to keep in same folder and In conditions <- (¨normal¨, ¨cancer¨). May I have to do like this for matrix.

(ii)I want to do with each sample patient which are not technical replicate in one folder.

Eg. Normal folder

Patient_1.sam.count

Cancer folder

patient_1.sam.count

May I to keep in same folder and how I have to manager conditions <- (). What I have to do with sample data.

ADD REPLY • link updated 3.1 years ago by Kevin Blighe 88k • written 3.1 years ago by Nai ▴ 50

0

Entering edit mode

Hi, you probably need to column-bind (bind by column) the files all.sam.normal.count.mat-tsv and all.sam.cancer. count.mat.tsv.

In your BASH (shell) terminal, can you show the output of:

head all.sam.normal.count.mat-tsv | cut -f1-5

?

You need to get your data in this format:

Expression data:

       Normal1 Normal2 Tumour1 Tumour2
geneA  3       4       6       2
geneB  55      35      23      67
geneC  34      3       55      21

Metadata:

         Condition  patient
Normal1  healthy    1
Normal2  healthy    2
Tumour1  cancer     1
Tumour2  cancer     2

ADD REPLY • link 3.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Gene Id                          P_1           P_2      P3

ENSG00000000003.15  1020    842 938 1077
ENSG00000000005.6   2   1   2   4
ENSG00000000419.14  488 447 343 423
ENSG00000000457.14  351 320 226 331
ENSG00000000460.17  92  69  45  75
ENSG00000000938.13  254 331 140 212
ENSG00000000971.16  918 2010    317 571
ENSG00000001036.14  817 547 507 630
ENSG00000001084.13  876 721 656 1054
ENSG00000001167.15  524 521 503 632

all belong to normal.

ADD REPLY • link updated 3.1 years ago by Kevin Blighe 88k • written 3.1 years ago by Nai ▴ 50

0

Entering edit mode

Perfect - you need to bind (by column) this data to the tumour data. When doing this, please verify that both datasets are aligned by Gene Id. Later, you should import [to DESeq2] the column-bound ('merged') data.

ADD REPLY • link 3.1 years ago by Kevin Blighe 88k

0

Entering edit mode

I have no idea how to merge the columns. They have same column names. How can I align as GeneId.

ADD REPLY • link 3.1 years ago by Nai ▴ 50

0

Entering edit mode

I have one folder named Normal and other have cancer which are having same sample number bam files where I executed htseq count separately for each sample in each directory. Simultaneously I executed htseq for all samples in cancer and all samples in normal separately. So I have :

Each sample based .count file Normal_patient1.count and so on.../ cancer_patient1.count and so on.
all_samples_normal.count.mat.tsv, all_samples_cancer.count.mat.tsv

As you showed me. I think I did some thing wrong in HTSeq. Please guide me every step. I will be heartily thankful to you.

ADD REPLY • link 3.1 years ago by Nai ▴ 50

2

Entering edit mode

I think that you could enlist some help locally, if possible. Keep in mind that textual descriptions of files, code, etc., make it difficult for us to assist you. We would basically need that you provide concrete information about file paths, directory names, contents of the files (even in part), etc.

There is very simple but effective advice here, as I mentioned earlier:

ADD REPLY • link 3.1 years ago by Kevin Blighe 88k

1

Entering edit mode

If you have two data files, one with Normal and One with Tumor (as implied above), you can do something like the following:

# read in the files for each sample type
normal <- read.delim(file="all_samples_normal.count.mat.tsv", sep="\t", header=TRUE, as.is=TRUE, row.names=1)
tumor <- read.delim(file="all_samples_cancer.count.mat.tsv", sep="\t", header=TRUE, as.is=TRUE, row.names=1)

# make sure they have the same structure/order
all(rownames(normal) == rownames(tumor))

# if the above statment returns "TRUE" create one table of all the data
alldata <- cbind(normal, tumor)

You can use something of the form above to build your matrix even if every sample is in its own file. If these steps don't make sense, I'm not sure how you'll be able to sensibly execute DESeq2 commands, and I would suggest you go back to basics to learn how to read a file into R, check the order or rownames of a matrix or dataframe, and how to combine columns to form a new dataframe. These are very very basic concepts that you should know how to execute.

ADD REPLY • link 3.1 years ago by seidel 11k

0

Entering edit mode

Dear Kevin and Seidel

Thank you, Really helpful. I am heartily thankful to you. I try and consult with you.

ADD REPLY • link 3.1 years ago by Nai ▴ 50

0

Entering edit mode

Can you please stop using the answer field for comments. I moved multiple of these already. Use ADD REPLY to keep the thread logically organized.

ADD REPLY • link 3.1 years ago by ATpoint 84k

0

Entering edit mode

I need your help to design metadata for deseq2. How can I prepare the metdatafile.

library(Deseq2)
countsName <- read.delim(file.tsv, sep = \t, header = TRUE, as.is=TRUE, row.names = 1)

Here I don,t have any column name. I am not getting about the following:

library("DESeq2")
ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable,
                                   directory = directory,
                                   design= ~ condition)

Here sample Table = count matrix.tsv directory= ?? design = ??? ( I have 50 samples from cancer and 50 sample from Normal). I am new in R. Kindly help me to mention data in DesSeq2

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 3.1 years ago by Nai ▴ 50

0

Entering edit mode

0

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory, design= ~ condition)

I got an error .

Error in file, "rt"): can not open the connection In addition : Warning message:

Error in file, "rt"): can not open 'file path'

ADD REPLY • link updated 3.1 years ago by Kevin Blighe 88k • written 3.1 years ago by Nai ▴ 50

0

Entering edit mode

Hi, what is the output of:

directory
list.files(directory)
list.files('.')

ADD REPLY • link 3.1 years ago by Kevin Blighe 88k

0

Entering edit mode

Dear Kevin, Now list.files(directory) showing file names. still I am getting same error

My file are eg: N1.bam.count......so on

ADD REPLY • link 3.1 years ago by Nai ▴ 50

0

Entering edit mode

Also, in your code above, you're leaving out quotes when you supply strings as arguments, e.g. file="myfile.csv", and sep="\t", these are important. Follow the examples very carefully. Experiment with simple steps to get basic things to work. Like, can you read a simple text file in your current directory? (this requires that you (1) know how to create a file, (2) know how to find and specify your current directory, (3) use R to read in a file. If you can do this, you should be able to correctly get DESeq2 to read a file. Read the manual very carefully.

ADD REPLY • link 3.1 years ago by seidel 11k

0

Entering edit mode

ddsHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable = sampleTable, directory = directory2, design =~ condition)
Warning message:
In DESeqDataSet(se, design = design, ignoreRank) :
  some variables in design formula are characters, converting to factors

> ddsHTSeq
class: DESeqDataSet
dim: 47051 100
metadata(1): version
assays(1): counts
rownames(47051): A1BG A1BG-AS1 ... ZZZ3 bA395L14.12
rowData names(0):
colnames(100): C1.bam.count C10.bam.count ... N8.bam.count
  N9.bam.count
colData names(1): condition

Now I am not getting accurately. what to do.

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 3.1 years ago by Nai ▴ 50

0

Entering edit mode

ddsHTSeq <- DESeq(ddsHTSeq)
estimating size factors
estimating dispersions
Error in checkForExperimentalReplicates(object, modelMatrix) :

    The design matrix has the same number of samples and coefficients to fit,
so estimation of dispersion is not possible. Treating samples
as replicates was deprecated in v1.20 and no longer supported since v1.

ADD REPLY • link updated 3.1 years ago by Ram 44k • written 3.1 years ago by Nai ▴ 50