Raw counts - FeatureCounts (Rsubread), HT Seq and BEdtools
0
1
Entering edit mode
8.1 years ago
Yaseen Ladak ▴ 30

Dear All, I would really appreciate some help. I have a aligned BAM file which is sorted. This file was obtained from paired end sequencing. I am trying to find the raw counts which fall in the regions mentioned in the GTF.

The GTF is below:


1dh10b  ena     exon    12163   14079   .       +       .       gene_id "ECDH10B_0014"; transcript_id "ACB01219"; exon_number "1"; gene_name "dnaK"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "dnaK-1"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "ACB01219-1";
1dh10b  ena     exon    4040647 4043550 .       +       .       gene_id "ECDH10B_3947"; transcript_id "ECDH10B_3947-1"; exon_number "1"; gene_name "rrlC"; gene_source "ena"; gene_biotype "rRNA"; transcript_name "rrlC"; transcript_source "ena"; transcript_biotype "rRNA"; exon_id "ECDH10B_3947-1";

When I ran Rsubreads in the paired end command and results mentioned below:


The code for feature count:

fc1<-featureCounts(files=c("R1.bam"),
                  annot.ext="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part.gtf",
                  isGTFAnnotationFile=TRUE,GTF.featureType="exon",
                  isPairedEnd=TRUE)
dge1 <- DGEList(counts=fc1$counts, genes=fc1$annotation)
write.table(dge1$counts,file="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part_gtf_pe.txt",sep="\t")

Result obtained: "R1.bam" "ECDH10B_0014" 6528 "ECDH10B_3947" 255754

When I run the above with single end mode i.e. isPairedEnd=FALSE, I get the following results: "R1.bam" "ECDH10B_0014" 12049 "ECDH10B_3947" 440677

I wanted to double check this with bedtools:

bedtools coverage -a test_part.gtf -b R1.bam > cov_prt_gtf_bam.txt

The counts I got were: ECDH10B_0014 12049 ECDH10B_3947 4040647

However, because bedtools results dont account for paired end the results were similar to feature counts in single end mode.

But when I ran HTseq command as below

htseq-count -f bam -s no R1.bam test_part.gtf > htseq_test_part_gtf.txt

Results ECDH10B_3947 0 ECDH10B_0014 6509

When I checked on the IGV browser for the location on where the gene ECDH10B_3947 is I could hardly find any reads but bedtools in single mode picked up counts similar to feature counts single mode. There were also almost half the counts in feature counts in paired end mode. But the HT seq counts were zero and on IGV also there were no reads. ITs strange that for ECDH10B_0014 gene the counts of HT seq was similar to feature count in paired end mode.

Can someone please help? I wold be grateful.

Thanks, Yaseen


RNA-Seq next-gen • 4.0k views
ADD COMMENT
1
Entering edit mode

It's likely that IGV and htseq-count are filtering a number of reads (possibly appropriately). You can check this by changing the filtering options in IGV. As a rule, if htseq-count and featureCounts disagree greatly the only reason is a setting different between them. Otherwise they're identical most of the time.

ADD REPLY
0
Entering edit mode

Thank you. Thats very helpful :)

ADD REPLY
0
Entering edit mode

Hello Yaseen Ladak!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=72405

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY
0
Entering edit mode

Thanks, I would be more careful next time.

ADD REPLY

Login before adding your answer.

Traffic: 1750 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6