How to add ptt file content to fasta header
0
0
Entering edit mode
7.1 years ago
Promi ▴ 10

Hey,

I just downloaded the all.faa.tar.gz and all.ptt.tar.gz from ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Bacteria/. Now I would like to merge the content in ptt files to the fasta header line for each corresponding protein sequence. The only thing that is common between both the file is PID gene ID. As I am beginner in Bionfiormatics, I would like to know how to execute this in python ?

Thanks :)

python fasta ptt file • 2.3k views
ADD COMMENT
1
Entering edit mode

Curious as to why are you using old archival refseq data?

ADD REPLY
0
Entering edit mode

Because new refseq includes several versions of an assembly, quite hard to manipulate. I want to setup a local BLAST database for bacterial proteins or genomes.

ADD REPLY
0
Entering edit mode

I don't think this is the right solution.

You could get current bacterial refseq genomes summary file here. Last column in the file contains direct links for the latest assembly folders of all bacterial genomes. From there it is the matter of getting the .faa files.

ADD REPLY
0
Entering edit mode

Thank you for the suggestion :)

ADD REPLY
0
Entering edit mode

These files are large as far as a FASTA goes. Can you show the first header from each file?

ADD REPLY
0
Entering edit mode

Fasta header:

>gi|336319135|ref|YP_004599103.1| chromosomal replication initiator protein DnaA [[Cellvibrio] gilvus ATCC 13127]
MAQDEELSRVWGHVVTTLEESPDITQRQLAFVRLAQPLGLLDGTIILAVGNEYTKEYLETKVRAEVTSAL
GSALGRDGRFAITVDPSLVDDAPPAVRAMTSAPELGVVTDGTDERGAPNRTVPTDADTGRHERSPMLSES
AEPTRPVRETASSRRPAAEPARLNPHYLFETFVIGSSNRFAHAAAVAVAEAPAKAYNPLFIYGDSGLGKT
HLLHAIGHYAQNLYPSVRVRYVNSEEFTNDFINSISEGKAGAFQRRYREVDVLLIDDIQFLQGKEQTMEE

PTT file header: Cellvibrio gilvus ATCC 13127 chromosome, complete genome - 1..3526441

3164 proteins

Location Strand Length PID Gene Synonym Code COG Product

ADD REPLY

Login before adding your answer.

Traffic: 2111 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6