Rsubread featureCounts with built-in mm10 vs Ensemble GRCm38.96.gff3
0
0
Entering edit mode
4.9 years ago
akh22 ▴ 110

I am trying to get counts from Bam files generated by STAR. I used Rsubread's featureCount with Ensemble Mus_musculus.GRCm38.96.gff3 ;

fc<-featureCounts(files = "10-IT-1-21-19.bam", nthreads = 24, isPairedEnd = T, isGTFAnnotationFile=T, annot.ext="Mus_musculus.GRCm38.96.gff3", GTF.attrType = "Name"))

and here is a stat;

                 Status Skin.25.IT.1.21.19.bam 
1                       Assigned                7686415           
2            Unassigned_Unmapped                      0                 
3      Unassigned_MappingQuality                      0               
4             Unassigned_Chimera                      0                   
5      Unassigned_FragmentLength                      0                 
6           Unassigned_Duplicate                      0                    
7        Unassigned_MultiMapping                      0                     
8           Unassigned_Secondary                      0                    
9            Unassigned_NonSplit                      0                    
10         Unassigned_NoFeatures                2985131

If I run this with buil-in mm10 as follows;

fc<-featureCounts(files = "10-IT-1-21-19.bam", nthreads = 24, isPairedEnd = T, annot.inbuilt = "mm10"))

and stat for this is;

Status Skin.25.IT.1.21.19.bam
1                       Assigned               18096135
2            Unassigned_Unmapped                      0
3      Unassigned_MappingQuality                      0
4             Unassigned_Chimera                      0
5      Unassigned_FragmentLength                      0
6           Unassigned_Duplicate                      0
7        Unassigned_MultiMapping                      0
8           Unassigned_Secondary                      0
9            Unassigned_NonSplit                      0
10         Unassigned_NoFeatures                4960586
11 Unassigned_Overlapping_Length                      0
12          Unassigned_Ambiguity                 911357

As you can see # of assigned by Ensemble Mus_musculus.GRCm38.96.gff3 is significantly less than the one by mm10. I thought Ensemble gff3 annotation has a larger coverage than mm10. I'd appreciate any comments on this.

Thanks.

RNA-Seq R assembly • 2.5k views
ADD COMMENT
0
Entering edit mode

What reference was used to make the bam?

ADD REPLY
0
Entering edit mode

I believe it was ensemble GRcm38, though I am not 100% certain since the read mapping was done by some other lab.

ADD REPLY
0
Entering edit mode

Try the GTF file from Ensembl instead (never use GFF files unless you have no other choice).

ADD REPLY
0
Entering edit mode

What is a reason for using GTF over GFF3 ? I thought GFF3 was a preferred annotation file over GTF.

Thanks.

ADD REPLY
0
Entering edit mode

So I tried the ensemble GTF and got on the average 80% assigned reads. GTF looks definitely promising than GFF3.

ADD REPLY
0
Entering edit mode

It's kind of a dirty little secret that GFF files are tough to support.

ADD REPLY
0
Entering edit mode

Yeah, I am begging to realize this, despite of what I was told from "experts".

Thanks.

ADD REPLY

Login before adding your answer.

Traffic: 1879 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6