Best way to determine gene positions in a related strain, from a well annotated reference? (and plus some RNAseq?)
1
0
Entering edit mode
6.0 years ago
Daniel ★ 3.9k

I have a fully assembled genome of Arabidopsis thaliana (Ler) which I am trying to produce a gff reference for (or similar format). I'm particularly interested in getting accurate Transcription start sites, but also coding start sites and hopefully intron-exon boundaries.

The major A. thal ecotype (Col) is highly annotated and pretty much a gold standard, and there aren't a huge number of differences in my closely related ecotype, however co-ordinates obviously start slipping and it's not ideal.

I have a few RNAseq datasets which I can use to predict models, but as I have a strong closely related reference already I figure that should be able to help. I have tried delving into Maker, but I'm drowning in options and so far I haven't been able to maintain gene models, only strings of exon matches. I thought a straight up blast with the related species's cds.fasta might work, but then I omit my RNAseq and any of the changes between the species.

Any suggestions would be appreciated!

maker anotation • 1.3k views
ADD COMMENT
1
Entering edit mode
6.0 years ago
shaun ▴ 80

You can use CDS and/or proteins from A. thal ecotype (Col) as input for BLAT against your genome. Map the RNAseq to validate "BLAT based " gene models. Use cufflinks to compare the models.

ADD COMMENT
0
Entering edit mode

Thanks, I've given BLAT a go and it looks like a good start which I can use to apply the RNAseq to.

ADD REPLY

Login before adding your answer.

Traffic: 1159 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6