Question

featureCounts output interpretation

0

Entering edit mode

7.6 years ago

dr.genetics ▴ 60

I've run a DNA-seq data file with featureCounts and got the following (c is my featureCounts return value)

> head(cbind(c$counts, c$annotation));
          GACTCCTCAATGTC.sam    GeneID
DDX11L1                    3   DDX11L1
WASH7P                     3    WASH7P
FAM138A                    0   FAM138A
FAM138F                    0   FAM138F
OR4F5                      0     OR4F5
LOC729737                  4 LOC729737
                                                             Chr
DDX11L1                                           chr1;chr1;chr1
WASH7P    chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1
FAM138A                         chr1;chr1;chr1;chr19;chr19;chr19
FAM138F                         chr1;chr1;chr1;chr19;chr19;chr19
OR4F5                                                       chr1
LOC729737                                         chr1;chr1;chr1
                                                                      Start
DDX11L1                                                   11874;12613;13221
WASH7P    14362;14970;15796;16607;16858;17233;17606;17915;18268;24738;29321
FAM138A                                 34611;35277;35721;76220;76886;77330
FAM138F                                 34611;35277;35721;76220;76886;77330
OR4F5                                                                 69091
LOC729737                                              134773;139790;140075
                                                                        End
DDX11L1                                                   12227;12721;14409
WASH7P    14829;15038;15947;16765;17055;17368;17742;18061;18366;24891;29370
FAM138A                                 35174;35481;36081;76783;77090;77690
FAM138F                                 35174;35481;36081;76783;77090;77690
OR4F5                                                                 70008
LOC729737                                              139696;139847;140566
                         Strand Length
DDX11L1                   +;+;+   1652
WASH7P    -;-;-;-;-;-;-;-;-;-;-   1769
FAM138A             -;-;-;-;-;-   2260
FAM138F             -;-;-;-;-;-   2260
OR4F5                         +    918
LOC729737                 -;-;-   5474

But I am a little confused about the results:

How count the count of WASH7P is only 3? It looks like there are 11 segments mapped to the gene?
Why FAM138A has 0 count? I understand the gene is located on two chromosomes: chr1 & chr19, but it has 3 counts on each of the chr.
OR4F5 has one read, and the segment spans exactly from the TSS to the TES? Guess there is a misunderstanding here.

Thanks.

software error next-gen gene • 6.6k views

ADD COMMENT • link 7.6 years ago by dr.genetics ▴ 60

0

Entering edit mode

Those are the begin and end coordinates of the (alternative) transcripts. Perfectly normal output if you ask me.

I'm not sure I know what you are trying to achieve, are you? This is DNA-sequencing, with which aim?

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

How to map BAM/SAM files to genes with abundance levels? has the background on this. Not 100% clear what dr.genetics wants to count/find depth for (to me).

ADD REPLY • link 7.6 years ago by GenoMax 141k

0

Entering edit mode

Our experiments generate double stranded breaks (DSBs), and we are trying to see if there are any hotspots of DSBs. The DNA-seq technique we used (GUIDE-seq) captures the sequences with DSBs as the 5' or 3' ends. The more we see a DNA fragment, the more likely a DSB occurs at one of its ends. So we are looking for the abundance of DNA fragments and thus the frequency of DBS at specific genomic loci.

In other words, I am looking for the abundancy of DNA fragments in the BAM/SAM files and the relationship of such DNA fragments to genes.

I assume the "count" of 3, 0, etc. means the count of DNA fragments seen in the BAM/SAM file mapped within a particular gene? If so, it does not have enough resolution because we are also interested in where exactly the DSBs are.

ADD REPLY • link 7.6 years ago by dr.genetics ▴ 60

0

Entering edit mode

Interesting experiment, and it will get you far quicker to helpful replies if you would have told us that earlier ;)

Featurecounts performs the counting per feature (in this case per gene), so therefore there are indeed 3 counts in WASH7P. But your resolution will probably be better.

I would suggest thinking in the direction of Chip-Seq experiments to perform clustering of those reads to find hotspots.

ADD REPLY • link 7.6 years ago by WouterDeCoster 47k

0

Entering edit mode

Great. thanks.

Can I just use the chr, start, end info in the SAM file for each sequence? Is there complications such as overlapping, reverse complement, etc.? If no, it seems that I can simply use such info directly?