Question

Which package to be used for RNA-seq normalization(Limma-voom)

0

Entering edit mode

6.5 years ago

KVC_bioinfo ▴ 590

Hello,

I have RNA-seq count data obtained from HT-Seq. I want to normalize the data using quantile normalization in R. Can I do that using Bioconductor package in?

Any help is appreciated.

Thanks!

RNA-Seq limma voom • 3.1k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 6.5 years ago by KVC_bioinfo ▴ 590

score 3 · Accepted Answer · 2017-10-26

3

Entering edit mode

6.5 years ago

Kevin Blighe 87k

Hello,

Yes, as per our previous exchange, Limma-Voom will allow you to normalise via quantile normalisation.

Other methods, like DESeq2, normalise via geometric normalisation on a negative binomial distribution of counts.

Kevin

ADD COMMENT • link 6.5 years ago by Kevin Blighe 87k

0

Entering edit mode

Thank you. so far I just have the raw counts of genes expressed in each sample. Do I have to first convert it into the Matrix? I am not sure how should I proceed?

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

1

Entering edit mode

I do not know if there is a specific import function in Limma/Voom for HTseq counts. However, you can read your data into a single data-frame (or data-matrix) by generally following this code:

#Get a list of all HTseq counts files
files <- dir(path="/home/KVC_bioinfo/htseq/", full.names=TRUE, recursive=TRUE)
files <- files[grep("htseq.counts$", files)]

#Create the bare bones of the data-frame that will house all raw counts
rawcounts <- data.frame(row.names=MyTranscriptNamesVector)

#Loop through each file and add the counts to the data-frame
for (i in 1:length(files))
{

    #sample <- read.table(gzfile(files[i]), sep="\t")[1:60483, 2] #if files are gzipped
    sample <- read.table(files[i], sep="\t")[1:60483, 2]

    rawcounts <- data.frame(rawcounts, sample)

    colnames(rawcounts)[i] <- files[i]
}

If you have a really large amount of files and a lot of transcripts, you may consider the fread() function in the data.table package, which can be used in place of read.table() (which I use above in my code) and that is quicker.

Other things that you will have to define beforehand are:

MyTranscriptNamesVector, a vector containing the trancsript names over which you have performed read count abundance with HTseq
The number of transcripts (in my case above, there were 60,483 transcripts; thus, I only read 60,483 rows from the HTseq files).

Check the format of your HTseq files very carefully and be sure that you understand what exactly you are doing.

Trust that this has helped

Kevin

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k

1

Entering edit mode

Thank you very much, Kevin. This is extremely helpful.

ADD REPLY • link 6.5 years ago by KVC_bioinfo ▴ 590

1

Entering edit mode

No problem mate. Good luck with it.

ADD REPLY • link 6.5 years ago by Kevin Blighe 87k