Question: Raw counts - FeatureCounts (Rsubread), HT Seq and BEdtools
1
gravatar for Yaseen Ladak
3.1 years ago by
Yaseen Ladak30
United Kingdom, London, Imperial College London
Yaseen Ladak30 wrote:

Dear All, I would really appreciate some help. I have a aligned BAM file which is sorted. This file was obtained from paired end sequencing. I am trying to find the raw counts which fall in the regions mentioned in the GTF.

The GTF is below:


1dh10b  ena     exon    12163   14079   .       +       .       gene_id "ECDH10B_0014"; transcript_id "ACB01219"; exon_number "1"; gene_name "dnaK"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "dnaK-1"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "ACB01219-1";
1dh10b  ena     exon    4040647 4043550 .       +       .       gene_id "ECDH10B_3947"; transcript_id "ECDH10B_3947-1"; exon_number "1"; gene_name "rrlC"; gene_source "ena"; gene_biotype "rRNA"; transcript_name "rrlC"; transcript_source "ena"; transcript_biotype "rRNA"; exon_id "ECDH10B_3947-1";

When I ran Rsubreads in the paired end command and results mentioned below:


The code for feature count:

fc1<-featureCounts(files=c("R1.bam"),
                  annot.ext="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part.gtf",
                  isGTFAnnotationFile=TRUE,GTF.featureType="exon",
                  isPairedEnd=TRUE)
dge1 <- DGEList(counts=fc1$counts, genes=fc1$annotation)
write.table(dge1$counts,file="/data/tellis/analysis/genome/combined/dh10b_gfp_psb1c3_h3/test_part_gtf_pe.txt",sep="\t")

Result obtained: "R1.bam" "ECDH10B_0014" 6528 "ECDH10B_3947" 255754

When I run the above with single end mode i.e. isPairedEnd=FALSE, I get the following results: "R1.bam" "ECDH10B_0014" 12049 "ECDH10B_3947" 440677

I wanted to double check this with bedtools:

bedtools coverage -a test_part.gtf -b R1.bam > cov_prt_gtf_bam.txt

The counts I got were: ECDH10B_0014 12049 ECDH10B_3947 4040647

However, because bedtools results dont account for paired end the results were similar to feature counts in single end mode.

But when I ran HTseq command as below

htseq-count -f bam -s no R1.bam test_part.gtf > htseq_test_part_gtf.txt

Results ECDH10B_3947 0 ECDH10B_0014 6509

When I checked on the IGV browser for the location on where the gene ECDH10B_3947 is I could hardly find any reads but bedtools in single mode picked up counts similar to feature counts single mode. There were also almost half the counts in feature counts in paired end mode. But the HT seq counts were zero and on IGV also there were no reads. ITs strange that for ECDH10B_0014 gene the counts of HT seq was similar to feature count in paired end mode.

Can someone please help? I wold be grateful.

Thanks, Yaseen


rna-seq next-gen • 1.7k views
ADD COMMENTlink modified 3.1 years ago • written 3.1 years ago by Yaseen Ladak30
1

It's likely that IGV and htseq-count are filtering a number of reads (possibly appropriately). You can check this by changing the filtering options in IGV. As a rule, if htseq-count and featureCounts disagree greatly the only reason is a setting different between them. Otherwise they're identical most of the time.

ADD REPLYlink written 3.1 years ago by Devon Ryan93k

Thank you. Thats very helpful :)

ADD REPLYlink written 3.0 years ago by Yaseen Ladak30

Hello Yaseen Ladak!

It appears that your post has been cross-posted to another site: http://seqanswers.com/forums/showthread.php?t=72405

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 3.1 years ago by WouterDeCoster42k

Thanks, I would be more careful next time.

ADD REPLYlink written 3.1 years ago by Yaseen Ladak30
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1721 users visited in the last hour