Hi I have done pan-genome analysis with 0.5% identity cutoff using BPGA tool. and it has given me core reference sequence, accessory reference file and unique sequences files. Now I have a list of sequence and i have aligned them with all 3 files i.e core, accessory and unique. There are some genes that shows alignment with core genome as well as accessory genome sequences. My parameters are >=50% identity, qcovhsp >=90% and evalue 0.0001. How can I segregate genes in core and accessory if they shows alignment with both files?

2.1 years ago
Mensur Dlakic

Not sure why you need to segregate anything when BPGA has already done it for you. The groupings were done based on distribution in multiple genomes, which is a broader relationship criterion than simple sequence similarity.

For genes that have paralogs, one of them may be in the core group, and others are in the accessory groups. Those paralogs could still retain relatively high percent identity and coverage, and most definitely E-value lower than 1e-4. If not universally present in multiple genomes, they will go into accessory group.

I understand that BPGA is segregating core accessory and unique. Now i have a list of genes in which i want to analyse whether it belongs to core or accessory. For that i did blast and getting same genes in core as well as accessory at 50% identity with >=90%query coverage. Can a single gene belongs to both core and accessory? Can u please tell me why? I am getting very confused..

Not sure why you are repeating your question, as I already answered it. A single gene can't be both in core and accessory groups, and it doesn't seem that you are observing anything that contradicts that.

Let's say you have a gene A, with two paralogs: A1 and A2. Gene A is present in all the species in your pangenome, so it goes into core group. Genes A1 and A2 are present only in some species but not all, so they go into accessory group. When you BLAST A1 or A2, they are similar enough to match A in core group, which seems to be what you are seeing. Similarly, BLASTing A would identify A1 and A2 in accessory group, because many paralogs are similar enough to match your identity and coverage criteria.

If you lower E-value in your search to 1x10-30 or so, you will likely see few if any of these cross-matches.

