Question: How to separate plasmid proteins from main chromosome proteins in a GenBank assembly record?
gravatar for svetlana.lockwood
3.0 years ago by
svetlana.lockwood20 wrote:

I'm working on a mega-project involving all proteobacterial proteomes.Recently NCBI relocated all assembly record here. Previously, records for plasmids were kept separately from main chromosomes, but now they are placed in one file. For example, GCA_000010825.1_ASM1082v1_protein.faa.gz.

Question: how in that record separate which proteins came from plasmids and which are from chromosomes?

If I worked on 1 genome, I could have traced each protein individually, but with more than 2,000 complete genomes it's not going to be feasible. Also, I cannot rely on sequence annotations such as,for example "plasmid backbone"since not all plasmid proteins are necessary "plasmid backbone" proteins.

Any ideas?

genbank assembly genome • 1.1k views
ADD COMMENTlink modified 4 months ago by RamRS20k • written 3.0 years ago by svetlana.lockwood20
gravatar for 5heikki
3.0 years ago by
5heikki8.1k wrote:

The feature table file..

cut -f5 -d $'\t' GCA_000010825.1_ASM1082v1_feature_table.txt | sort | uniq -c
 5400 chromosome
 844 plasmid
 1 seq_type

Parse the IDs from there and then extract from the faa file..

awk -F '\t' '{if($5=="chromosome")print $11}' GCA_000010825.1_ASM1082v1_feature_table.txt |\
    grep . > GCA_000010825.1_ASM1082v1_chromosome_protein_coding.acc
ADD COMMENTlink modified 5 months ago by RamRS20k • written 3.0 years ago by 5heikki8.1k

Wow! Thank you very much! I would have never known this.

ADD REPLYlink written 3.0 years ago by svetlana.lockwood20

Very useful , Thanks

ADD REPLYlink written 18 months ago by sinumolgeorge10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 855 users visited in the last hour