How to add gene function/protein name, start and end position to DESeq2 results for bacteria
2
0
Entering edit mode
3.1 years ago
Miguel • 0

Hello,

Background:

I am analyzing data for an RNA-seq experiment made on a bacterial strain. I have a previously assembled genome that was annotated with PGAP. I processed my raw RNA-seq reads, aligned them to the genome with STAR, and quantified the alignments with HTseq. I fed the HTseq read counts to DESeq2 and now I have a table with the results, though I end up with a gene_id in the first column of the table and DESeq2 results.

Problem: The gene_id is not very informative. I want the gene_id, the CDS start and end position (to control for gene length bias), and also the transcript product (protein) or function. I have this information in a GTF annotation file of course file but I can't find any sort of tools or code to add this information to my DESeq2 results. DESeq2 results table

I thought it would be a simple task, but either it is not, or I am not understanding something properly. I have seen some tutorials using Biomart or other tools like that, but it seems like it's just for eukaryotes and it's using online databases. I guess I don't understand why one would not simply use the GTF annotation file?

My plan going forward is to annotate with GO terms or KEGG to do an enrichment analysis. The step I am stuck on seems easy but for some reason, I can't find a solution.

Any help is welcome, Thank you.

deseq2 annotation • 2.1k views
ADD COMMENT
0
Entering edit mode

I want the gene_id, the CDS start and end position (to control for gene length bias

This information should be in the GTF/GFF file that you used. Can you show us a few lines of that file?

ADD REPLY
0
Entering edit mode

Yes, it is in that file. I just did not know how to join it with DESeq2 results. I will try the solution from post below and see if it works.

Here's a line:

contig_Chromosome1  GenBank CDS 366179  367354  .   -   1   gene_id "JT734_01855"; transcript_id "JT734_01855.t01"; ID "JT734_01855.p01"; Name "ftsZ"; Note "Derived by automated computational analysis using gene prediction method: Protein Homology."; Parent "JT734_01855.t01"; codon_start "1"; inference "COORDINATES: similar to AA sequence:RefSeq:YP_004997066.1"; locus_tag "JT734_01855"; product "cell division protein FtsZ"; protein_id "PRJNA701820:JT734_01855"; transl_table "11"; translation "length.391";
ADD REPLY
0
Entering edit mode

I don't know how you did the read counting but featureCounts can assign the gene name using the ID you specify. You should have used Name as the id.

ADD REPLY
0
Entering edit mode

I used HTseq which is integrated into STAR. Perhaps I could try featureCounts yes.

ADD REPLY
1
Entering edit mode

With htseq-count you could have used -i Name or --additional-attr to add a column with names.

ADD REPLY
1
Entering edit mode
3.1 years ago
Michael 54k

I'd say this is more a question for Bioconductor Support https://support.bioconductor.org/ but I am going to answer this on a general basis. Yes, it is a simple task if simplicity is defined using existing packages in R. The only difficulty is to know the right package to use. A search for "R Import GTF file" yields https://www.biostars.org/p/272889/ as top hit. That gives you your answer for how to import a GTF file. Once you have rtracklayer installed, check the help on width, start, and end. Using R function merge you can join your GTF annotation data or any other annotation with the DEseq results as long as the annotation also contains the same gene ids. Described here: https://stackoverflow.com/questions/1299871/how-to-join-merge-data-frames-inner-outer-left-right

In addition, you might need to use as.data.frame to convert the classes to generic data.frames. This function works on most things in R (with some mixed results), and works very well in this case.

ADD COMMENT
1
Entering edit mode
3.1 years ago

It's not easy to add to the results object itself, so make a data frame object from the result object, then merge it with your gtf info

ADD COMMENT

Login before adding your answer.

Traffic: 1895 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6