Question: convert bam file to raw read count file for DESeq or EdgeR
4
gravatar for illinois.ks
5.4 years ago by
illinois.ks160
Korea, Republic Of
illinois.ks160 wrote:

I was working on bam files generated from Tophat in order to convert into the raw count file. 

Firstly, I just converted directly bam file into the raw count file using the following command.

  • htseq-count -f bam KDR_pre_thout/accepted_hits.bam genes.gtf > KDR_pre_raw_count.txt 

which directly conver the bam file into the raw count file. 

 

But I found the following paper. (http://www.nature.com/nprot/journal/v8/n9/pdf/nprot.2013.099.pdf), which suggest that converting the bam file into the sam file and then, convert sam file into the raw count file. 

I thought it would be same.. but NOT. 

 

  • samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
  • samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
  • htseq-count -s no -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count

=============================================================================

I have used these two ways in order to convert the bam file into the count file.

And I found that these results are different. Which I have to follow?? Second as paper mentioned(maybe sort is the important difference) Then, why??? Anybody knows why sort is necessary??

Please help me with this.!

 

Thanks.

rna-seq • 11k views
ADD COMMENTlink modified 5.4 years ago by Devon Ryan92k • written 5.4 years ago by illinois.ks160
3
gravatar for Devon Ryan
5.4 years ago by
Devon Ryan92k
Freiburg, Germany
Devon Ryan92k wrote:

Until recently, htseq-count didn't support coordinate-sorted files, so if you used an older version then that'd be the cause of the difference. Otherwise, the results should be identical and any difference would be due to a bug. Presuming this is reproducible, you should notify Simon Anders so he can fix the bug.

BTW, the results from the name-sorted file will be more reliable, solely because that's been debugged more thoroughly (there's no a priori reason for there to be a difference).

ADD COMMENTlink written 5.4 years ago by Devon Ryan92k

Hello Devon 

 

Thank you for your comments!. so you are mentioning that it should be same whether I have followed the first trial (directly convert from bam to count file without sort) or the second trail ( convert bam to sam and then sam to count file with sorting option). 

 

Since result difference is relatively big ( as you mentioned if it is a bug) , htseq-count should be fixed. ( I hope Simon Anders could see this posting!! ) 

 

BTW, I just realized that my one of option for htseq-count was different, which was -s no option... 

THe default was "yes". I didn't set up this option for the first trial, which implicitly means that "yes" but my second trial, it was set as "no" . 

 

I suspect that it would also cause the difference. so changed it.. as third trial, but all my three trial return the different count file, which is weired.!!  hm... 

option 1 ) 

  • htseq-count -f bam KDR_pre_thout/accepted_hits.bam genes.gtf > KDR_pre_raw_count.txt 

option2 ) 

  • samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
  • samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
  • htseq-count -s no -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count

option 3) 

  • samtools sort -n KDR_pre_thout/accepted_hits.bam KDR_pre_sn
  • samtools view -o KDR_pre_sn.sam KDR_pre_sn.bam
  • htseq-count -a 10 KDR_pre_sn.sam gene.gtf > KDR_pre_sn.count

==================================================================

All the results from three trial looks different. 

 

Could you please let me know what is "strand-specific assay" means???

ADD REPLYlink written 5.4 years ago by illinois.ks160
1

Given that you switched between the strand-specific and unstranded counting as well as changing the minimum quality score for option 1, it's completely expected that you'd get different results. You need to do an apples-to-apples comparison for the results to be meaningful.

Regarding strand-specific libraries, I and others have explained this many many times. Here's one example. Just google around for more explanations.

ADD REPLYlink written 5.4 years ago by Devon Ryan92k

Hello Devon

Thank you so much for your explanation!! It really helped!!!  I will try to search and google more carefully!!! :)

BTW, as I reported before, given the exact same option (strand specific option, quality option), but still it returns the large difference in result  when I try to do the sort the bam file or not..  I hope it can be fixed soon!! :) 

 

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by illinois.ks160

did this issue fixed? or that you finally figured out why it is like that....

ADD REPLYlink written 3.8 years ago by colonppg100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1966 users visited in the last hour