I have FLAT annotation text file of 20 genes. I just want to extract certain lines (e.g DT lines) for each gene. I am planning to match my uniprot ID (e.g. P15519) with the AC line with each FLAT annotation info in between the // signs then matching the lines Start with DTs to extract the lines with DT info with regular perl script but it looks like not that much straightforward to me. I heard Emsembl perl API modules can also extract certain info from FLAT annotation files. I was wondering whats the best way to get these lines out from the FLAT files for every single gene?
Here is a part of the FLAT annotation text file.
........
.........
//
ID SPG1_DICDI Reviewed; 127 AA.
AC P15519; Q54KF8;
DT 01-APR-1990, integrated into UniProtKB/Swiss-Prot.
DT 01-APR-1990, sequence version 1.
DT 01-MAY-2013, entry version 59.
DE RecName: Full=Spore germination protein
GN Name=gerA; ORFNames=DDB_G0287293;
DR RefSeq; XP_637265.1; XM_632173.1.
DR KEGG; ddi:DDB_G0287293; -.
DR dictyBase; DDB_G0287293;
FT CHAIN 26 127 Spore germination protein 1.
FT /FTId=
FT CARBOHYD 118 118 N-linked (GlcNAc...) (Potential).
SQ SEQUENCE 127 AA; 14049 MW; 8D6FEAF74A06C742 CRC64;
MNIKNSLILI ISTIFVLSMI NGGLTSDPTC VGAPDGQVYL FSSWDFQGER YVYNISQGYL
SLPDGFRNNI QSFVSGADVC FVKWYPSEQY QITAGESHRN YAALTNFGQR MDAIIPGNCS
NIVCSPK
//
ID CO1A1_MOUSE Reviewed; 1453 AA.
AC P11087; Q53WT0; Q60635; Q61367; Q61427; Q63919; Q6PCL3; Q810J9;
DT 01-JUL-1989, integrated into UniProtKB/Swiss-Prot.
DT 06-DEC-2005, sequence version 4.
DT 01-MAY-2013, entry version 136.
RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 1 AND 2).
DR EMBL; U08020; AAA88912.1; -; mRNA.
DR EMBL; AL662790; CAI25880.1; -; Genomic_DNA.
FT SIGNAL 1 22
SQ SEQUENCE 1453 AA; 138032 MW; 0B7F06BBB9A1D5EA CRC64;
MFSFVDLRLL LLLGATALLT HGQEDIPEVS CIHNGLRVPN GETWKPEVCL ICICHNGTAV
CDDVQCNEEL DCPNPQRREG ECCAFCPEEY VSPNSEDVGV EGPKGDPGPQ GPRGPVGPPG
RDGIPGQPGL PGPPGPPGPP GPPGLGGNFA SQMSYGYDEK SAGVSVPGPM GPSGPRGLPG
PPGAPGPQGF QGPPGEPGEP GGSGPMGPRG PPGPPGKNGD DGEAGKPGRP GERGPPGPQG
ARGLPGTAGL PGMKGHRGFS GLDGAKGDAG PAGPKGEPGS PGENGAPGQM
//
.......
......
In
awk
, the default action after an expression is to print. You can simplify the command as such:awk '/^DT/' input-file.txt