Question: Which package to be used for RNA-seq normalization(Limma-voom)
0
gravatar for KVC_bioinfo
24 months ago by
KVC_bioinfo410
Boston
KVC_bioinfo410 wrote:

Hello,

I have RNA-seq count data obtained from HT-Seq. I want to normalize the data using quantile normalization in R. Can I do that using Bioconductor package in???

Any help is appreciated.

Thanks!

ADD COMMENTlink modified 24 months ago by Kevin Blighe49k • written 24 months ago by KVC_bioinfo410
3
gravatar for Kevin Blighe
24 months ago by
Kevin Blighe49k
Kevin Blighe49k wrote:

Hello,

Yes, as per our previous exchange, Limma-Voom will allow you to normalise via quantile normalisation.

Other methods, like DESeq2, normalise via geometric normalisation on a negative binomial distribution of counts.

Kevin

ADD COMMENTlink written 24 months ago by Kevin Blighe49k

Thank you. so far I just have the raw counts of genes expressed in each sample. Do I have to first convert it into the Matrix? I am not sure how should I proceed?

ADD REPLYlink written 24 months ago by KVC_bioinfo410
1

I do not know if there is a specific import function in Limma/Voom for HTseq counts. However, you can read your data into a single data-frame (or data-matrix) by generally following this code:

#Get a list of all HTseq counts files
files <- dir(path="/home/KVC_bioinfo/htseq/", full.names=TRUE, recursive=TRUE)
files <- files[grep("htseq.counts$", files)]

#Create the bare bones of the data-frame that will house all raw counts
rawcounts <- data.frame(row.names=MyTranscriptNamesVector)

#Loop through each file and add the counts to the data-frame
for (i in 1:length(files))
{

    #sample <- read.table(gzfile(files[i]), sep="\t")[1:60483, 2] #if files are gzipped
    sample <- read.table(files[i], sep="\t")[1:60483, 2]

    rawcounts <- data.frame(rawcounts, sample)

    colnames(rawcounts)[i] <- files[i]
}

If you have a really large amount of files and a lot of transcripts, you may consider the fread() function in the data.table package, which can be used in place of read.table() (which I use above in my code) and that is quicker.

Other things that you will have to define beforehand are:

  • MyTranscriptNamesVector, a vector containing the trancsript names over which you have performed read count abundance with HTseq
  • The number of transcripts (in my case above, there were 60,483 transcripts; thus, I only read 60,483 rows from the HTseq files).

Check the format of your HTseq files very carefully and be sure that you understand what exactly you are doing.

Trust that this has helped

Kevin

ADD REPLYlink modified 24 months ago • written 24 months ago by Kevin Blighe49k
1

Thank you very much, Kevin. This is extremely helpful.

ADD REPLYlink written 24 months ago by KVC_bioinfo410
1

No problem mate. Good luck with it.

ADD REPLYlink written 24 months ago by Kevin Blighe49k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 984 users visited in the last hour