Question: Fasta to gff with custom set of genes
0
gravatar for tlorin
6 months ago by
tlorin210
Switzerland
tlorin210 wrote:

Dear all,

I have a custom list of 100 genes that I manually curated to obtain the full CDS and I would like to make differential expression (DE) analysis between samples for this very subset. I now I cannot simply map all the reads onto this subset and perform DE analysis because I would have normalization bias (using DESeq2 or edgeR), so I need to map all the reads on the whole genome.

Fortunately, I also have the raw sequence of a genome (multifasta file) as long as an automatic annotation – and the corresponding GFF file. The problem is that this annotation is not good enough of the 100 curated genes.

My plan was (1) run BLAT to get the exact genomic coordinates of my manually curated set of genes (2) merge the newly obtained GFF with the first (automatic and non-curated) one with Cufflinks gffcompare or and (3) run DESeq2 using this new annotation.

Would any of you have any suggestion regarding this protocol or any alternative tools to suggest?

Many thanks!

rna-seq deseq2 gff genome fasta • 216 views
ADD COMMENTlink modified 6 months ago by Macspider1.5k • written 6 months ago by tlorin210
1
gravatar for Macspider
6 months ago by
Macspider1.5k
Vienna - BOKU
Macspider1.5k wrote:

now I cannot simply map all the reads onto this subset and perform DE analysis because I would have normalization bias (using DESeq2 or edgeR), so I need to map all the reads on the whole genome.

I don't agree completely. If your reads are RNASeq reads and you map them against a transcriptome, it shouldn't be a problem.

Would any of you have any suggestion regarding this protocol or any alternative tools to suggest?

A suggestion would be to try using GMAP with GFF output, so you map the sequences to the genome and get a GFF as output automatically. It's really handy.

ADD COMMENTlink written 6 months ago by Macspider1.5k
1

I don't agree completely. If your reads are RNASeq reads and you map them against a transcriptome, it shouldn't be a problem. Definitely! Was just saying that you have to map your reads to a whole genome or transcriptome, not to "simply" 100 or 200 genes.

ADD REPLYlink written 6 months ago by tlorin210
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1065 users visited in the last hour