Question: Differential gene expression DESeq2
0
gravatar for umeshtanwar2
10 months ago by
umeshtanwar210
umeshtanwar210 wrote:

Hi all

I am working on the RNAseq data of Arabidopsis thaliana. I have done the differential gene expression analysis by using DESeq2. Now I have a list of DE genes (gene ID) in csv file. How can I have the names of the genes in this file? My file looks like this:

baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
gene:AT5G65080  122.875394083372    1.69474723920958    0.145231051625416   11.7004967984981    1.26709498309877E-31    5.85423224091292E-28
gene:AT1G64380  168.896650212747    1.30834505764162    0.132353055400632   9.78909978664265    1.25407410840269E-22    2.41419716485088E-19
gene:AT5G65070  82.6936482493549    1.26016444752325    0.14553940276928    8.66897373013599    4.36032539676582E-18    3.59742417823883E-15

Any guidance from you will be very helpful

Thank you

rna-seq • 381 views
ADD COMMENTlink written 10 months ago by umeshtanwar210
1

I presume you mean that you want to convert your current IDs (for example, AT5G65080) to gene symbols? From where did you get the data in the first place? If a GTF was used in the original count abundance step (prior to DESeq2), then the corresponding gene symbols may be in that file.

If you literally just want to read the file back into R, then use read.csv()

ADD REPLYlink modified 10 months ago • written 10 months ago by Kevin Blighe53k

Thank you Kevin. I used the STAR for alignment of reads on reference with using annotations in gff3 file format. Then I used featureCounts for prior to DESeq2. I converted annotation file gff3 to gtf for using in featureCounts. Did it make the difference?

ADD REPLYlink written 10 months ago by umeshtanwar210
1

From where did you obtain that gff3 file? If you look inside the file, you may see a field for gene symbol.

ADD REPLYlink written 10 months ago by Kevin Blighe53k

I obtained the gff3 file from:

Arabidopsis release 42

Please suggest me if it is correct to convert the gff3 to gtf for featureCounts?

ADD REPLYlink modified 10 months ago • written 10 months ago by umeshtanwar210
1

In the GFF3 file, gene symbol is used with the name tag, where available.

Yes, why not? - it is okay to convert GFF3 to GTF. If you want gene symbols, specify GTF.attrType='name' when using featureCounts in R, or -g name when using featureCounts in a linux / cluster environment.

ADD REPLYlink written 10 months ago by Kevin Blighe53k

Thank you so much @Kevin. I will do this when using featureCounts.

ADD REPLYlink written 10 months ago by umeshtanwar210
1

Okay, but not all genes appear to have a gene name. I am not too familiar with A. thaliana annotations. Best of luck.

ADD REPLYlink written 10 months ago by Kevin Blighe53k

I am facing this problem:

//================================= Running ==================================\ || || || Load annotation file oldArabidopsis_thaliana.TAIR10.42.gtf ... ||

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file. The specified gene identifier attribute is 'gene_name' The attributes included in your GTF annotation are 'transcript_id "transcript:AT1G03987.1"; gene_id "gene:AT1G03987";'

ADD REPLYlink written 10 months ago by umeshtanwar210
1

Can you post a few lines from your GTF file? It sounds like there is no attribute called "gene_name" in the file, so you might need a different attribute.

ADD REPLYlink written 10 months ago by shawn.w.foley1.1k
1   araport11   exon    3631    3913    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon1"; constitutive "1"; ensembl_end_phase "1"; ensembl_phase "-1"; rank "1"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
1   araport11   exon    3996    4276    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon2"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "1"; rank "2"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
  1 araport11   exon    4706    5095    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon4"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "0"; rank "4"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
1   araport11   exon    4706    5095    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon4"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "0"; rank "4"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";
1   araport11   exon    5174    5326    .   +   .   transcript_id "transcript:AT1G01010.1"; gene_id "gene:AT1G01010"; gene_name "NAC001"; Name "AT1G01010.1.exon5"; constitutive "1"; ensembl_end_phase "0"; ensembl_phase "0"; rank "5"; biotype "protein_coding"; transcript_id "AT1G01010.1"; protein_id "AT1G01010.1";

My GTF file looks like this.

ADD REPLYlink modified 10 months ago • written 10 months ago by umeshtanwar210

Which extra information do you want, exactly? Your gene IDs are already the official symbols for A. thaliana. Please help us by being as specific as you can be.

ADD REPLYlink written 10 months ago by Kevin Blighe53k

Please follow up on this thread rather than opening new ones. If you have trouble, explain what the problem is and put in some effort to show that you indeed try to work on the issue.

ADD REPLYlink written 10 months ago by ATpoint28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 764 users visited in the last hour