Hi,
I am currently analysing a RNAseq dataset of human lymphocytes containing a vector (which has mCHERRY as reporter gene and a different gene A).
In order to include the annotation for the new genes I have followed the instructions of Alexander Dobin (https://groups.google.com/forum/#!topic/rna-star/FGQRotrCB1Q) and it seems like STAR (v2.7.3a) is detecting them and including them in the BAM files.
However, when I try summarising the BAM files using featureCounts (I am using Rsubread 1.34.4 in my university's computer cluster and subread 2.0.0 at my local machine) these genes are not listed (not even with 0 counts). I have also tried adding an additional line for each gene changing "exon" for "gene", but it does not solve the issue.
Do you have any suggestions of why this might be?
This is the code I am using in the Rsubread:
feature_counts <- featureCounts( nthreads = 6,
isGTFAnnotationFile = TRUE,
GTF.featureType = "exon",
annot.ext = "genome/gencode.v31.primary_assembly.annotation.gtf",
GTF.attrType = "gene_id",
primaryOnly = TRUE,
isPairedEnd = TRUE,
files = paste( "alignment/",
list.files( path = "alignment/",
pattern = ".bam$",
recursive = TRUE,
include.dirs = TRUE )[-1],
sep = "" ) )
And in subread:
featureCounts -T 2 -t exon -a ./gencode.v31.primary_assembly.annotation.gtf -t exon -g gene_id -M --primary -p -o ./count_matrix/counts.txt \./alignment_test/AG*/*.bam
Many thanks!!
Eduardo
Did you grep for the added "chromosome" to be sure it is in the GTF? Please show the respective line of that file.
When I check the chrName.txt the name of the chromosome is there (mCHERRY) and when I look at the GTF file I get this:
Edit: only the mCHERRY line, fo it to be clearer:
Can you try changing
AddedGenes
toENSEMBL
to be consistent with rest of the file? The entry for that gene does have the same name>mCHERRY
in your reference genome fasta and it is present in your alignment file?Hi, I am generating now the genome indices changing AddedGenes to ENSEMBL.
As for the FASTA, the name is the same in the FASTA file name, in the first line of the FASTA (>mCHERRY) and in the GTF.
After STAR alignment (without changing AddedGenes) I was getting mCHERRY in the the Log.out file as a chromosome:
And these are the last two lines when grepping for mCHERRY in the BAM file after alignment:
I will wait to see if changing AddedGenes for ENSEMBL makes a difference.
Great. Looks like everything else seems to be in order.
Hi,
I have repeated the genome indexing, the alignment and the summarisation with featureCounts after changing AddedGenes to ENSEMBL and still the additional genes are not summarised in featureCounts.
I also tried to convert my "old"/AddedGenes GTF to SAF using the flattenGTF function (from Rsubread) in case it would make a difference. The genes I included (e.g. mCHERRY) are not in the SAF produced by the function, which makes me think that it is a problem of the Subread package interpreting the GTF file (or at least the modifications I made). Could it be?