RNASeq gene labeling and mRNA filter from bulkRNA data.
1
0
Entering edit mode
10 months ago
Yeeshouw ▴ 10

Hello,

Currently, I have BAM files sent to me (I have acces to fastq files as well if that is a required data) from a sequencing company, and generated a count matrix using RSubreads package function, featureCounts(). I have also ran DESeq2 through the count matrix and produced a filtered list of significant DEGs. However, I am noticing that there are a fair portion of the DEGs do not correspond to mRNA transcripts.

My question is, is there a way during the alignment process to label the reads being counted. For example, for gene X is a protein coding gene based on a reference annotation, and once the count table is generated, there some meta data column or output denoting the type of gene X. Ultimately I want to be able to select the types of genes being analyzed downstream, e.g. mRNA.

Thank you in advance, Yeeshouw Wang

RNA-Seq RSubreads • 1.1k views
ADD COMMENT
2
Entering edit mode

This information is annotated in GTF files. You can get them for almost every annotated species from Ensembl. There is a column gene_biotype or gene_type that is protein_coding or other types of genes. You can use that for filtering.

ADD REPLY
0
Entering edit mode

Thank you very much for information! I am able to see this column you mention.

ADD REPLY
1
Entering edit mode
10 months ago
rfran010 ▴ 900

featureCounts can extract this information and output a column for you during counting, presuming you input a GTF file.

extraAttributes I believe

ADD COMMENT
0
Entering edit mode

Yes, I am inputting a GTF file. I reviewed the [featureCounts][1] documentation, and I could not find this extraAttributes parameter. I do see GTF.attrType and GTF.featureType, would one of these be the parameter that you mention? I would suspect it is the GTF,attrType and setting it to "gene_biotype" or "gbkey"?

ADD REPLY
1
Entering edit mode

What version are you running? I use v2.0.3 on the command line.

It looks like this option may be added in a later version, with GTF.attrType.extra https://rdrr.io/bioc/Rsubread/man/featureCounts.html

The paramters you mentioned determine how featureCounts groups reads. e.g. if attrType is exons, it only counts 'exon' lines and featureType is how it groups, so if set to gene_id, it will count all lines with gene_id "ENSG00000245848"; as the same gene.

ADD REPLY
1
Entering edit mode

I see, I believe I was running an older version or have missed the option, I have updated to v2.10.5 and can see this option now. Thank you, I see now what .attrType is for.

I have tried to set GTF.attrType.extra to equal "gbkey," however, the count matrix does not seem to have outputted any extra column or information regarding this parameter. Is the input formatting or name incorrect? I have double checked the GTF file and it does have an information column labeled as "gbkey"

enter image description here

Edit: I have found that this information is stored in the objects $annotation output. Thank you for your help.

ADD REPLY

Login before adding your answer.

Traffic: 2584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6