Question

Refseq vs original genome assembly and issues

0

Entering edit mode

5.4 years ago

Biogeek ▴ 470

A long winded post with multiple questions to gauge the consensus of the 'correct' approach to RNA-seq alignment when there is a Refseq vs published assembly version of a genome present.

I've got the scenario where there is a published genome available as V1.0 and V1.1 (Genbank), and also on refseq.

I've aligned my mRNA-seq reads with STAR to Aiptasiav1.1 using the available Genbank (GCA_) files available in the above link. I've now performed gene counts and differential expression. When I use http://aiptasia.reefgenomics.org/download/aipgene_to_kxj.tsv.gz to convert the NCBI gene accessions to the original AIPGENE concessions to get functional gene annotations, I notice that there are 5 more genes in the v1.1 version compared to the v1.0 functional annotations (aipgene_to_kxj.tsv file is used for to map back). This has me questioning using version 1.1 all together. Ive looked for these 5 missing genes on NCBI and they indeed have functional annotations on NCBI.

My questions are:

Do I revert back to v 1.0 where everything seems to be complete (the original version which was published before it was submitted to Genbank) or do I download the Refseq gff3 and scaffolds and re-do my analysis. If so, how can I obtain the functional gene annotations from refseq to know what the genes are, and then how can I go about getting GO terms for downstream analysis?

Thanks for your time!

refseq genome alignment annotation • 1.1k views

ADD COMMENT • link 5.4 years ago by Biogeek ▴ 470

score 0 · Answer 1 · 2018-12-11

0

Entering edit mode

5.4 years ago

GenoMax 141k

Prior discussion about this also took place in: NCBI genome version vs published genome version - what's 'better'?

ADD COMMENT • link 5.4 years ago by GenoMax 141k