Question: Refseq vs original genome assembly and issues
gravatar for Biogeek
7 months ago by
Biogeek350 wrote:

A long winded post with multiple questions to gauge the consensus of the 'correct' approach to RNA-seq alignment when there is a Refseq vs published assembly version of a genome present.

I've got the scenario where there is a published genome available as V1.0 and V1.1 (Genbank), and also on refseq.

I've aligned my mRNA-seq reads with STAR to Aiptasiav1.1 using the available Genbank (GCA_) files available in the above link. I've now performed gene counts and differential expression. When I use to convert the NCBI gene accessions to the original AIPGENE concessions to get functional gene annotations, I notice that there are 5 more genes in the v1.1 version compared to the v1.0 functional annotations (aipgene_to_kxj.tsv file is used for to map back). This has me questioning using version 1.1 all together. Ive looked for these 5 missing genes on NCBI and they indeed have functional annotations on NCBI.

My questions are:

Do I revert back to v 1.0 where everything seems to be complete (the original version which was published before it was submitted to Genbank) or do I download the Refseq gff3 and scaffolds and re-do my analysis. If so, how can I obtain the functional gene annotations from refseq to know what the genes are, and then how can I go about getting GO terms for downstream analysis?

Thanks for your time!

ADD COMMENTlink written 7 months ago by Biogeek350
gravatar for genomax
7 months ago by
United States
genomax69k wrote:

Prior discussion about this also took place in: NCBI genome version vs published genome version - what's 'better'?

ADD COMMENTlink written 7 months ago by genomax69k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 543 users visited in the last hour