Question: Which package to be used for RNA-seq normalization(Limma-voom)
0
gravatar for KVC_bioinfo
2.8 years ago by
KVC_bioinfo450
Boston
KVC_bioinfo450 wrote:

Hello,

I have RNA-seq count data obtained from HT-Seq. I want to normalize the data using quantile normalization in R. Can I do that using Bioconductor package in???

Any help is appreciated.

Thanks!

ADD COMMENTlink modified 2.8 years ago by Kevin Blighe63k • written 2.8 years ago by KVC_bioinfo450
3
gravatar for Kevin Blighe
2.8 years ago by
Kevin Blighe63k
Kevin Blighe63k wrote:

Hello,

Yes, as per our previous exchange, Limma-Voom will allow you to normalise via quantile normalisation.

Other methods, like DESeq2, normalise via geometric normalisation on a negative binomial distribution of counts.

Kevin

ADD COMMENTlink written 2.8 years ago by Kevin Blighe63k

Thank you. so far I just have the raw counts of genes expressed in each sample. Do I have to first convert it into the Matrix? I am not sure how should I proceed?

ADD REPLYlink written 2.8 years ago by KVC_bioinfo450
1

I do not know if there is a specific import function in Limma/Voom for HTseq counts. However, you can read your data into a single data-frame (or data-matrix) by generally following this code:

#Get a list of all HTseq counts files
files <- dir(path="/home/KVC_bioinfo/htseq/", full.names=TRUE, recursive=TRUE)
files <- files[grep("htseq.counts$", files)]

#Create the bare bones of the data-frame that will house all raw counts
rawcounts <- data.frame(row.names=MyTranscriptNamesVector)

#Loop through each file and add the counts to the data-frame
for (i in 1:length(files))
{

    #sample <- read.table(gzfile(files[i]), sep="\t")[1:60483, 2] #if files are gzipped
    sample <- read.table(files[i], sep="\t")[1:60483, 2]

    rawcounts <- data.frame(rawcounts, sample)

    colnames(rawcounts)[i] <- files[i]
}

If you have a really large amount of files and a lot of transcripts, you may consider the fread() function in the data.table package, which can be used in place of read.table() (which I use above in my code) and that is quicker.

Other things that you will have to define beforehand are:

  • MyTranscriptNamesVector, a vector containing the trancsript names over which you have performed read count abundance with HTseq
  • The number of transcripts (in my case above, there were 60,483 transcripts; thus, I only read 60,483 rows from the HTseq files).

Check the format of your HTseq files very carefully and be sure that you understand what exactly you are doing.

Trust that this has helped

Kevin

ADD REPLYlink modified 2.8 years ago • written 2.8 years ago by Kevin Blighe63k
1

Thank you very much, Kevin. This is extremely helpful.

ADD REPLYlink written 2.8 years ago by KVC_bioinfo450
1

No problem mate. Good luck with it.

ADD REPLYlink written 2.8 years ago by Kevin Blighe63k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 769 users visited in the last hour