Question: Fasta to gff with custom set of genes
0
gravatar for tlorin
4 weeks ago by
tlorin150
Switzerland
tlorin150 wrote:

Dear all,

I have a custom list of 100 genes that I manually curated to obtain the full CDS and I would like to make differential expression (DE) analysis between samples for this very subset. I now I cannot simply map all the reads onto this subset and perform DE analysis because I would have normalization bias (using DESeq2 or edgeR), so I need to map all the reads on the whole genome.

Fortunately, I also have the raw sequence of a genome (multifasta file) as long as an automatic annotation – and the corresponding GFF file. The problem is that this annotation is not good enough of the 100 curated genes.

My plan was (1) run BLAT to get the exact genomic coordinates of my manually curated set of genes (2) merge the newly obtained GFF with the first (automatic and non-curated) one with Cufflinks gffcompare or and (3) run DESeq2 using this new annotation.

Would any of you have any suggestion regarding this protocol or any alternative tools to suggest?

Many thanks!

rna-seq deseq2 gff genome fasta • 102 views
ADD COMMENTlink modified 4 weeks ago by Macspider1.1k • written 4 weeks ago by tlorin150
1
gravatar for Macspider
4 weeks ago by
Macspider1.1k
Vienna - BOKU
Macspider1.1k wrote:

now I cannot simply map all the reads onto this subset and perform DE analysis because I would have normalization bias (using DESeq2 or edgeR), so I need to map all the reads on the whole genome.

I don't agree completely. If your reads are RNASeq reads and you map them against a transcriptome, it shouldn't be a problem.

Would any of you have any suggestion regarding this protocol or any alternative tools to suggest?

A suggestion would be to try using GMAP with GFF output, so you map the sequences to the genome and get a GFF as output automatically. It's really handy.

ADD COMMENTlink written 4 weeks ago by Macspider1.1k
1

I don't agree completely. If your reads are RNASeq reads and you map them against a transcriptome, it shouldn't be a problem. Definitely! Was just saying that you have to map your reads to a whole genome or transcriptome, not to "simply" 100 or 200 genes.

ADD REPLYlink written 4 weeks ago by tlorin150
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 665 users visited in the last hour