Question

Bowtie2 > SAM > BAM > featureCounts - how to get into DESeq2(?)

1

Entering edit mode

3.5 years ago

dnljmrs ▴ 20

Hi all, I have RNAseq data in fastq format, and I've done alignment with bowtie2, output to SAM, sorted and converted to BAM, then used featureCounts to generate table/matrix of counts for samples.

I've edited the delim file so that it doesn't have the header, or unnecessary columns, but does contain the geneid column, and then subsequent columns for each of my 6 samples (3 of each condition) with counts underneath for each gene.

I want to use DESeq2 to do the differential expression but I'm having a lot of trouble starting out. I can't seem to find anything of how to import the file into DESeq2 to get it to analyse, and all the tutorials I'm finding suggest using salmon, or tximport (which I dont think supports the pipeline I've used..or certainly in the help file, only lists salmon, sailfish, kallisto etc).

Can anyone point me in the right direction of where to find some other info, or give me an idea of where to start? I'm still early in my learning of bioinformatics, and appreciate your support.

RNA-Seq deseq2 • 4.5k views

ADD COMMENT • link updated 3.5 years ago by ATpoint 82k • written 3.5 years ago by dnljmrs ▴ 20

1

Entering edit mode

You can create a read count matrix using featureCounts by feeding it all of your BAM files in the order you want them in the matrix columns (genes will be in rows, samples in columns). You can then read the matrix into DESeq2 using these instructions in DESeq2 vignette.

ADD REPLY • link 3.5 years ago by GenoMax 142k

0

Entering edit mode

Thanks genomax - will take a look at those instructions

ADD REPLY • link 3.5 years ago by dnljmrs ▴ 20

score 0 · Answer 1 · 2020-10-20

0

Entering edit mode

3.5 years ago

ATpoint 82k

The following is valid if you use the command line version of featureCounts, not the one from Rsubread. Assuming you ran something like featureCounts -a your.gtf -o raw_featurecounts.txt *.bam

I recommend not changing anything in the output file. You can do all that in R. Assuming you have the raw featureCounts output and (in this dummy case here) two groups with two samples, here 2xKO and 2xWT, do:

library(DESeq2)

#/ simply skip the first line which contains the command line used:
featurecounts <- read.delim("raw_featurecounts.txt", header = TRUE, skip = 1)

#/ the first column right of Length is where the counts start, until the end of the matrix
column.from = which(colnames(featurecounts) == "Length") + 1
column.end  = ncol(featurecounts)

#/ colData, this you need to modify to match your experimental design, I have four samples here with two groups
coldata = data.frame(Condition = c("KO", "KO", "WT", "WT"), stringsAsFactors = TRUE)

dds <- 
DESeqDataSetFromMatrix(countData = featurecounts[, column.from:column.end],
                       colData = coldata,
                       design = ~Condition)

#/ add gene names to dds
rownames(dds) <- featurecounts$Geneid

ADD COMMENT • link 3.5 years ago by ATpoint 82k

1

Entering edit mode

By the way, you use bowtie2, are you working with a non-splicing organism? bowtie2 is not splice-aware.

Edit: Just saw your post from a couple of days ago, you are a microbiologist apparently, so no need for a splice-aware aligner apparently, never mind then.

Can you share the command line of your featureCounts run though. Probably you did not use a GTF file then with it, did you?

ADD REPLY • link 3.5 years ago by ATpoint 82k

0

Entering edit mode

Thats great - thank you. I am using command line for featureCounts, so I will re-run it as I didn't keep the original file (rookie error) before removing headers etc. Here's what I ran. Paired end reads, specify gene (because exon only picks up the RNA genes in my annotated file..i found out after numerous attempts at this)

/Volumes/Daniel_HDD/RNAseq_old2/bin/subread-2.0.1/bin/featureCounts -p -t gene -a /Volumes/Daniel_HDD/RNAseq/GCF_000026665.1_ASM2666v1_genomic.gtf -o /Volumes/Daniel_HDD/RNAseq/featurecounts_combine_nomulti /Volumes/Daniel_HDD/RNAseq/WT1.sort.bam /Volumes/Daniel_HDD/RNAseq/WT2.sort.bam /Volumes/Daniel_HDD/RNAseq/WT3.sort.bam /Volumes/Daniel_HDD/RNAseq/zomB1.sort.bam /Volumes/Daniel_HDD/RNAseq/zomB2.sort.bam /Volumes/Daniel_HDD/RNAseq/zomB3.sort.bam

Yep, using prokaryote (bacteria) so bowtie is fine for what I need (or so I've been recommended, at least). I'll follow your instructions after this and see how that goes, thanks again

Edit: I'm getting 90-93% assigned alignment for the featureCounts output (82% in one)

ADD REPLY • link 3.5 years ago by dnljmrs ▴ 20