Which package to be used for RNA-seq normalization(Limma-voom)
1
0
Entering edit mode
6.5 years ago
KVC_bioinfo ▴ 590

Hello,

I have RNA-seq count data obtained from HT-Seq. I want to normalize the data using quantile normalization in R. Can I do that using Bioconductor package in?

Any help is appreciated.

Thanks!

RNA-Seq limma voom • 3.1k views
ADD COMMENT
3
Entering edit mode
6.5 years ago

Hello,

Yes, as per our previous exchange, Limma-Voom will allow you to normalise via quantile normalisation.

Other methods, like DESeq2, normalise via geometric normalisation on a negative binomial distribution of counts.

Kevin

ADD COMMENT
0
Entering edit mode

Thank you. so far I just have the raw counts of genes expressed in each sample. Do I have to first convert it into the Matrix? I am not sure how should I proceed?

ADD REPLY
1
Entering edit mode

I do not know if there is a specific import function in Limma/Voom for HTseq counts. However, you can read your data into a single data-frame (or data-matrix) by generally following this code:

#Get a list of all HTseq counts files
files <- dir(path="/home/KVC_bioinfo/htseq/", full.names=TRUE, recursive=TRUE)
files <- files[grep("htseq.counts$", files)]

#Create the bare bones of the data-frame that will house all raw counts
rawcounts <- data.frame(row.names=MyTranscriptNamesVector)

#Loop through each file and add the counts to the data-frame
for (i in 1:length(files))
{

    #sample <- read.table(gzfile(files[i]), sep="\t")[1:60483, 2] #if files are gzipped
    sample <- read.table(files[i], sep="\t")[1:60483, 2]

    rawcounts <- data.frame(rawcounts, sample)

    colnames(rawcounts)[i] <- files[i]
}

If you have a really large amount of files and a lot of transcripts, you may consider the fread() function in the data.table package, which can be used in place of read.table() (which I use above in my code) and that is quicker.

Other things that you will have to define beforehand are:

  • MyTranscriptNamesVector, a vector containing the trancsript names over which you have performed read count abundance with HTseq
  • The number of transcripts (in my case above, there were 60,483 transcripts; thus, I only read 60,483 rows from the HTseq files).

Check the format of your HTseq files very carefully and be sure that you understand what exactly you are doing.

Trust that this has helped

Kevin

ADD REPLY
1
Entering edit mode

Thank you very much, Kevin. This is extremely helpful.

ADD REPLY
1
Entering edit mode

No problem mate. Good luck with it.

ADD REPLY

Login before adding your answer.

Traffic: 2576 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6