Question: Bam to read counts for differential expression analysis
1
gravatar for Vasu
2.3 years ago by
Vasu420
Vasu420 wrote:

Hello,

I have a bam file for the samples of RNA-Seq data. I need to get read counts for these to perform differential expression analysis. Which tools should I use for this and how?

Thanq

ADD COMMENTlink modified 2.3 years ago by manuelmendoza50 • written 2.3 years ago by Vasu420

There are many tools for that such as featureCounts, htseq, bedtools (multicov) etc... Also they are well documented.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by arta580
3
gravatar for genomax
2.3 years ago by
genomax80k
United States
genomax80k wrote:

featureCounts is the easiest option.

ADD COMMENTlink written 2.3 years ago by genomax80k
1

...as well as multithreaded and tailored to RNA-seq. Go with featureCounts. You will need a GFF/GTF file that fits your organism/the assembly you aligned to and the BAM files.

### Assign reads to exons based on the GTF annotation:
# --a = the GTF/GFF file
# --F = specify the format of -a
# --p = data are paired-end (unset it if you have single-end)
# --T = set number of threads
# -P  = only consider pairs with ISIZE defined by -d & -D, default 50-600bp
# -o  = output file
$FCOUNTS -a $GTF -F GTF -p -T 8 -P -o countMatrix_all.txt *.bam
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by ATpoint32k

the bam file I have is not sorted one. And I'm also not sure whether this is paired end or single end. Any idea how to check this and sort the bam file before getting the counts data.

ADD REPLYlink written 2.3 years ago by Vasu420

Sorted BAMs are not necessary, FC will sort internally if you have paired-end data (still, sorting a file is as easy as samtools sort -o sorted.bam unsorted.bam). I edited my post in that regard. For the checking, simply use samtools view -f 1 in.bam | head and then check if some reads pop up. -f 1 means only keep paired reads. If there are none, your data are not PE. Alternatively, use samtools view -H in.bam to get the header of the file, and then check the @PG line at the bottom where the alignment command is saved. If you need help, please post the output of both commands, so we can have a look.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by ATpoint32k

It is single end. After giving the command [samtools view -f 1 in.bam | head] I didnt see anything pop up. This means no need to give -p in the feature counts command. Am I right? And may I know from where I can get the gtf file? [hg19 genome]

ADD REPLYlink written 2.3 years ago by Vasu420

No need for -p. You need to get a GTF file that matches your genome (important because UCSC uses chr1 where as Ensembl just 1). What kind of chromosome names do you have?

ADD REPLYlink written 2.3 years ago by genomax80k

I actually gave it like this samtools view -H file.bam

I see this

@HD VN:1.4  SO:coordinate   GO:none
@SQ SN:NR_039978    LN:984
@SQ SN:NM_130786    LN:1766
@SQ SN:NM_138932    LN:9293
@SQ SN:NM_033110    LN:2678
@SQ SN:NM_000014    LN:4678
@SQ SN:NM_144670    LN:5192
@SQ SN:NR_040112    LN:1201
ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Vasu420

You appear to have aligned your data to a set of mRNA. This is going to cause problems for counting since you are going to have to make a GTF file up yourself for these. Why did you do this instead of aligning against the genome?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax80k

The bam files are from IonTorrent server

ADD REPLYlink written 2.3 years ago by Vasu420

Then Torrent Server should have also produced a file with counts.

ADD REPLYlink written 2.3 years ago by genomax80k

Ok. So I see an option - Download absolute reads table [This is read counts data] Am I right?

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by Vasu420

Likely. I have not used Torrent Server recently but should be easy to check. Make sure the reads counts are raw (not normalized), if you intend to use them with DESeq2/edgeR.

ADD REPLYlink written 2.3 years ago by genomax80k

Yes, I see both. Absolute reads table and normalized table. I got the absolute reads table ans using edgeR for DE analysis now.

ADD REPLYlink written 2.3 years ago by Vasu420

Is there any other way to get the counts from bam files. When I did the DE analysis with the absolute reads which I got from Ion torrent server with edgeR, I dont see the gene which I need in the differential expressed genes list.

ADD REPLYlink written 2.3 years ago by Vasu420

I dont see the gene which I need in the differential expressed genes list.

While we like to get an expected result for our experiments, biology does not always work that way. Perhaps the difference is subtle (or variation is too high) and is not supported by number of samples/replicates etc.

ADD REPLYlink modified 2.3 years ago • written 2.3 years ago by genomax80k

Ok. Could you please check this link [http://10.16.32.45/output/Home/Re-analysis_full_report_2_108/plugin_out/RNASeqAnalysis_out.144/RNASeqAnalysis.html]. Go down and you see - Download absolute reads table. Could you please check whether they are read counts or not.

ADD REPLYlink written 2.3 years ago by Vasu420
0
gravatar for manuelmendoza
2.3 years ago by
manuelmendoza50 wrote:

I use StringTie... Another option is Cufflinks

ADD COMMENTlink written 2.3 years ago by manuelmendoza50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1276 users visited in the last hour