Question: GCA vs GCF or both
gravatar for cyberbrainedu
12 days ago by
cyberbrainedu0 wrote:


I have just started to learn bioinformatics and pangenomics. So if this question seems to you pretty basic then I apologize in advance.

As we know, NCBI for genome database, there are two kinds of sequences available refseq and genbank. I have read the differences in both refseq and genbank. But I was just curious which one would be prefered for pangenome studies? For example phylogeny built by refseq sequences would be different from genbank sequences? As I read, refseq is already curated, annotated and contaminants are removed. Will it affect the phylogeny that we will build after using any pangenome pipeline?

I am particularly interested in non-synonymous mutations (frameshifts or stop codons inside coding sequences). Do refseq curators also remove this kind of unusual errors or mutations? If yes then it would be difficult to study such mutations in refseq sequences.

Somebody may say a combination of both genbank and refseq would be good, after removing the duplicate one. Which one should I remove refseq or genbank? As I read in this previous post on biostar (


snp assembly genome • 72 views
ADD COMMENTlink written 12 days ago by cyberbrainedu0
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1107 users visited in the last hour