featureCounts output interpretation
0
0
Entering edit mode
7.6 years ago
dr.genetics ▴ 60

I've run a DNA-seq data file with featureCounts and got the following (c is my featureCounts return value)

> head(cbind(c$counts, c$annotation));
          GACTCCTCAATGTC.sam    GeneID
DDX11L1                    3   DDX11L1
WASH7P                     3    WASH7P
FAM138A                    0   FAM138A
FAM138F                    0   FAM138F
OR4F5                      0     OR4F5
LOC729737                  4 LOC729737
                                                             Chr
DDX11L1                                           chr1;chr1;chr1
WASH7P    chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1
FAM138A                         chr1;chr1;chr1;chr19;chr19;chr19
FAM138F                         chr1;chr1;chr1;chr19;chr19;chr19
OR4F5                                                       chr1
LOC729737                                         chr1;chr1;chr1
                                                                      Start
DDX11L1                                                   11874;12613;13221
WASH7P    14362;14970;15796;16607;16858;17233;17606;17915;18268;24738;29321
FAM138A                                 34611;35277;35721;76220;76886;77330
FAM138F                                 34611;35277;35721;76220;76886;77330
OR4F5                                                                 69091
LOC729737                                              134773;139790;140075
                                                                        End
DDX11L1                                                   12227;12721;14409
WASH7P    14829;15038;15947;16765;17055;17368;17742;18061;18366;24891;29370
FAM138A                                 35174;35481;36081;76783;77090;77690
FAM138F                                 35174;35481;36081;76783;77090;77690
OR4F5                                                                 70008
LOC729737                                              139696;139847;140566
                         Strand Length
DDX11L1                   +;+;+   1652
WASH7P    -;-;-;-;-;-;-;-;-;-;-   1769
FAM138A             -;-;-;-;-;-   2260
FAM138F             -;-;-;-;-;-   2260
OR4F5                         +    918
LOC729737                 -;-;-   5474

But I am a little confused about the results:

  • How count the count of WASH7P is only 3? It looks like there are 11 segments mapped to the gene?

  • Why FAM138A has 0 count? I understand the gene is located on two chromosomes: chr1 & chr19, but it has 3 counts on each of the chr.

  • OR4F5 has one read, and the segment spans exactly from the TSS to the TES? Guess there is a misunderstanding here.

Thanks.

software error next-gen gene • 6.6k views
ADD COMMENT
0
Entering edit mode

Those are the begin and end coordinates of the (alternative) transcripts. Perfectly normal output if you ask me.

I'm not sure I know what you are trying to achieve, are you? This is DNA-sequencing, with which aim?

ADD REPLY
0
Entering edit mode

How to map BAM/SAM files to genes with abundance levels? has the background on this. Not 100% clear what dr.genetics wants to count/find depth for (to me).

ADD REPLY
0
Entering edit mode

Our experiments generate double stranded breaks (DSBs), and we are trying to see if there are any hotspots of DSBs. The DNA-seq technique we used (GUIDE-seq) captures the sequences with DSBs as the 5' or 3' ends. The more we see a DNA fragment, the more likely a DSB occurs at one of its ends. So we are looking for the abundance of DNA fragments and thus the frequency of DBS at specific genomic loci.

In other words, I am looking for the abundancy of DNA fragments in the BAM/SAM files and the relationship of such DNA fragments to genes.

I assume the "count" of 3, 0, etc. means the count of DNA fragments seen in the BAM/SAM file mapped within a particular gene? If so, it does not have enough resolution because we are also interested in where exactly the DSBs are.

ADD REPLY
0
Entering edit mode

Interesting experiment, and it will get you far quicker to helpful replies if you would have told us that earlier ;)

Featurecounts performs the counting per feature (in this case per gene), so therefore there are indeed 3 counts in WASH7P. But your resolution will probably be better.

I would suggest thinking in the direction of Chip-Seq experiments to perform clustering of those reads to find hotspots.

ADD REPLY
0
Entering edit mode

Great. thanks.

Can I just use the chr, start, end info in the SAM file for each sequence? Is there complications such as overlapping, reverse complement, etc.? If no, it seems that I can simply use such info directly?

ADD REPLY
0
Entering edit mode

That would work yes, if you find a sensible way to aggregate those.

ADD REPLY

Login before adding your answer.

Traffic: 3017 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6