Entering edit mode
2.8 years ago
simplysarah • 0
I am very new to bioinformatics and just learning how to do RNA-seq analysis.
I have 63 samples, for which I have the htseq count files that I need to create a matrix with each column as a sample. How do I go about doing this in RStudio? Please give me some tips and pointers since i Have no idea where to start in making a table with so many samples and extracting the data from each one !
EDIT: tximport does not work with HTSeq (as stated by Mike Love here as HTSeq provides gene-level and not transcript-level counts. DESeq2 has a HTSeq import function built into its workflow (
DESeqDataSetFromHTSeqCount) that handles HTSeq.
Have you looked at the tximport vignette? It is created precisely for this purpose of importing RNAseq quantification from multiple files and combining relevant data into a single matrix.
I would like to do this in a simple way using basic R functions
You can use
cbindto do it. There could be several edge cases and problems that tximport would address, so I would recommend against reinventing the wheel.
Thank you. If my file format is .htseq can I still use the tximport method? I looked at the tximport but I am still confused how I would do this...can you please guide me?
It looks like tximport doesn't do htseq - but you can use DESeq2 to read htseq output and then get
counts()as a matrix from that. Check out this post: How to input data for DESeq2 from individual HTSeq count?
BTW, please don't add an answer unless it answers the top level question. Use Add Comment or Add Reply instead.