Question: Refseq vs original genome assembly and issues
gravatar for Biogeek
17 months ago by
Biogeek380 wrote:

A long winded post with multiple questions to gauge the consensus of the 'correct' approach to RNA-seq alignment when there is a Refseq vs published assembly version of a genome present.

I've got the scenario where there is a published genome available as V1.0 and V1.1 (Genbank), and also on refseq.

I've aligned my mRNA-seq reads with STAR to Aiptasiav1.1 using the available Genbank (GCA_) files available in the above link. I've now performed gene counts and differential expression. When I use to convert the NCBI gene accessions to the original AIPGENE concessions to get functional gene annotations, I notice that there are 5 more genes in the v1.1 version compared to the v1.0 functional annotations (aipgene_to_kxj.tsv file is used for to map back). This has me questioning using version 1.1 all together. Ive looked for these 5 missing genes on NCBI and they indeed have functional annotations on NCBI.

My questions are:

Do I revert back to v 1.0 where everything seems to be complete (the original version which was published before it was submitted to Genbank) or do I download the Refseq gff3 and scaffolds and re-do my analysis. If so, how can I obtain the functional gene annotations from refseq to know what the genes are, and then how can I go about getting GO terms for downstream analysis?

Thanks for your time!

ADD COMMENTlink written 17 months ago by Biogeek380
gravatar for genomax
17 months ago by
United States
genomax83k wrote:

Prior discussion about this also took place in: NCBI genome version vs published genome version - what's 'better'?

ADD COMMENTlink written 17 months ago by genomax83k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 939 users visited in the last hour