Question: Bam to read counts for differential expression analysis
0
gravatar for Vasu
15 months ago by
Vasu320
Vasu320 wrote:

Hello,

I have a bam file for the samples of RNA-Seq data. I need to get read counts for these to perform differential expression analysis. Which tools should I use for this and how?

Thanq

ADD COMMENTlink modified 15 months ago by manuelmendoza40 • written 15 months ago by Vasu320

There are many tools for that such as featureCounts, htseq, bedtools (multicov) etc... Also they are well documented.

ADD REPLYlink modified 15 months ago • written 15 months ago by arta540
2
gravatar for genomax
15 months ago by
genomax63k
United States
genomax63k wrote:

featureCounts is the easiest option.

ADD COMMENTlink written 15 months ago by genomax63k
1

...as well as multithreaded and tailored to RNA-seq. Go with featureCounts. You will need a GFF/GTF file that fits your organism/the assembly you aligned to and the BAM files.

### Assign reads to exons based on the GTF annotation:
# --a = the GTF/GFF file
# --F = specify the format of -a
# --p = data are paired-end (unset it if you have single-end)
# --T = set number of threads
# -P  = only consider pairs with ISIZE defined by -d & -D, default 50-600bp
# -o  = output file
$FCOUNTS -a $GTF -F GTF -p -T 8 -P -o countMatrix_all.txt *.bam
ADD REPLYlink modified 15 months ago • written 15 months ago by ATpoint14k

the bam file I have is not sorted one. And I'm also not sure whether this is paired end or single end. Any idea how to check this and sort the bam file before getting the counts data.

ADD REPLYlink written 15 months ago by Vasu320

Sorted BAMs are not necessary, FC will sort internally if you have paired-end data (still, sorting a file is as easy as samtools sort -o sorted.bam unsorted.bam). I edited my post in that regard. For the checking, simply use samtools view -f 1 in.bam | head and then check if some reads pop up. -f 1 means only keep paired reads. If there are none, your data are not PE. Alternatively, use samtools view -H in.bam to get the header of the file, and then check the @PG line at the bottom where the alignment command is saved. If you need help, please post the output of both commands, so we can have a look.

ADD REPLYlink modified 15 months ago • written 15 months ago by ATpoint14k

It is single end. After giving the command [samtools view -f 1 in.bam | head] I didnt see anything pop up. This means no need to give -p in the feature counts command. Am I right? And may I know from where I can get the gtf file? [hg19 genome]

ADD REPLYlink written 15 months ago by Vasu320

No need for -p. You need to get a GTF file that matches your genome (important because UCSC uses chr1 where as Ensembl just 1). What kind of chromosome names do you have?

ADD REPLYlink written 15 months ago by genomax63k

I actually gave it like this samtools view -H file.bam

I see this

@HD VN:1.4  SO:coordinate   GO:none
@SQ SN:NR_039978    LN:984
@SQ SN:NM_130786    LN:1766
@SQ SN:NM_138932    LN:9293
@SQ SN:NM_033110    LN:2678
@SQ SN:NM_000014    LN:4678
@SQ SN:NM_144670    LN:5192
@SQ SN:NR_040112    LN:1201
ADD REPLYlink modified 15 months ago • written 15 months ago by Vasu320

You appear to have aligned your data to a set of mRNA. This is going to cause problems for counting since you are going to have to make a GTF file up yourself for these. Why did you do this instead of aligning against the genome?

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax63k

The bam files are from IonTorrent server

ADD REPLYlink written 15 months ago by Vasu320

Then Torrent Server should have also produced a file with counts.

ADD REPLYlink written 15 months ago by genomax63k

Ok. So I see an option - Download absolute reads table [This is read counts data] Am I right?

ADD REPLYlink modified 15 months ago • written 15 months ago by Vasu320

Likely. I have not used Torrent Server recently but should be easy to check. Make sure the reads counts are raw (not normalized), if you intend to use them with DESeq2/edgeR.

ADD REPLYlink written 15 months ago by genomax63k

Yes, I see both. Absolute reads table and normalized table. I got the absolute reads table ans using edgeR for DE analysis now.

ADD REPLYlink written 15 months ago by Vasu320

Is there any other way to get the counts from bam files. When I did the DE analysis with the absolute reads which I got from Ion torrent server with edgeR, I dont see the gene which I need in the differential expressed genes list.

ADD REPLYlink written 15 months ago by Vasu320

I dont see the gene which I need in the differential expressed genes list.

While we like to get an expected result for our experiments, biology does not always work that way. Perhaps the difference is subtle (or variation is too high) and is not supported by number of samples/replicates etc.

ADD REPLYlink modified 15 months ago • written 15 months ago by genomax63k

Ok. Could you please check this link [http://10.16.32.45/output/Home/Re-analysis_full_report_2_108/plugin_out/RNASeqAnalysis_out.144/RNASeqAnalysis.html]. Go down and you see - Download absolute reads table. Could you please check whether they are read counts or not.

ADD REPLYlink written 15 months ago by Vasu320
0
gravatar for manuelmendoza
15 months ago by
manuelmendoza40 wrote:

I use StringTie... Another option is Cufflinks

ADD COMMENTlink written 15 months ago by manuelmendoza40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1144 users visited in the last hour