featureCounts output file chromosome identifier issue
7 weeks ago
tanbiswas6 ▴ 10


I have used featureCounts to generate WTS (data was PE WTS) count file using aligned sorted bam files. featureCounts output looks like this:

Program:featureCounts v1.6.3; Command:"featureCounts" "-T" "4" "-s" "2" "-a" "/Tools/hg38.refGene.gtf" "-o" "6_aligned_sorted_duprm.bam"                        
# Geneid      Chr   Start   End Strand  Length  6_aligned_sorted_duprm.bam
DDX11L1 chr1;chr1;chr1  11874;12613;13221   12227;12721;14409   +;+;+   1652    0

WASH7P  chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1;chr1  14362;14970;15796;16607;16858;17233;17606;17915;18268;24738;29321   14829;15038;15947;16765;17055;17368;17742;18061;18366;24891;29370   -;-;-;-;-;-;-;-;-;-;-   1769    706

MIR6859-1   chr1;chr1;chr16;chr15   17369;187891;17052;101973524    17436;187958;17119;101973591    -;-;-;+ 272 0

MIR1302-11  chr1;chr19;chr9;chr15   30366;71973;30144;101960459 30503;72110;30281;101960596 +;+;+;- 552 0

FAM138A chr1;chr1;chr1;chr19;chr19;chr19;chr9;chr9;chr9 34611;35277;35721;76220;76886;77330;34394;35060;35504   35174;35481;36081;76783;77090;77690;34957;35264;35864   -;-;-;-;-;-;-;-;-   3390    0

You can see that for a single gene the chr position is showing different chromosomal locations. Why the is happening? Is this a fault while running featureCounts? How to solve this?

Thank you.



featureCounts RNASeq trnascriptome WTS sequencing • 286 views
7 weeks ago
GenoMax 141k

You can see that for a single gene the chr position is showing different chromosomal locations. Why the is happening?

Each gene has multiple exons. People generally count at the exon level but then summarize the counts using -t exon -g gene_id at the gene level so you get one count per gene. Multiple chromosome locations are shown for each gene which are start coordinates of the exons (e.g. https://ncbi.nlm.nih.gov/gene/100287102 for DDX11L1).

I noticed that you seem to have removed duplicates based on the file name (e.g. 6_aligned_sorted_duprm.bam). This should not be done for RNAseq unless you have UMI's.

Hi GenoMax,

Thank you for looking into this. I wanted to ask that if you look into MIR6859-1, for example, then it is showing chr1;chr16;chr15 in the chromosome number column. How is this possible?

I have also used fastp to generate deduplicated raw fastq files. Should this be done? Also, is it important to count multimapping reads for featureCounts?

Regards, Tanay

Since microRNA's are short it may be annotated to multiple locations based on the sequence. Your annotation file must have it annotated as such.

You should not deduplicate raw fastq files for RNAseq data before alignment. People generally ignore multi-mapped reads. If you wish you use them then use a tool like salmon that will use statistical modeling to distribute multimapped reads.


