Question: Filtering sequences with multiple headers by length
0
gravatar for dod
19 months ago by
dod0
dod0 wrote:

Hi,

I have a fna file downloaded from the database containing all CDS of a bacterial strain. The format is shown below. I would like to filter (remove) those CDS less than 200 n.t. How do I do this using command line?

I've looked into the previous posts related to this topic, but the awk did not work.

Thanks!

>lcl|AL111168.1_cds_CAL34182.1_1 [gene=dnaA] [locus_tag=Cj0001] [db_xref=EnsemblGenomes-Gn:Cj0001,EnsemblGenomes-Tr:CAL34182,GOA:Q9PJB0,InterPro:IPR001957,InterPro:IPR003593,InterPro:IPR010921,InterPro:IPR013159,InterPro:IPR013317,InterPro:IPR018312,InterPro:IPR020591,InterPro:IPR024633,InterPro:IPR027417] [protein=chromosomal replication initiator protein] [protein_id=CAL34182.1] [location=1..1323] [gbkey=CDS]
ATGAATCCAAGCCAAATACTTGAAAATTTAAAAAAAGAATTAAGTGAAAACGAATACGAAAACTATTTATCAAATTTAAA
ATTCAACGAAAAACAAAGCAAAGCAGATCTTTTAGTTTTTAATGCTCCAAATGAACTCATGGCTAAATTCATACAAACAA
AATACGGCAAAAAAATCGCGCATTTTTATGAAGTGCAAAGCGGAAATAAAGCCATCATAAATATACAAGCACAAAGTGCT
AAACAAAGCAACAAAAGCACAAAAATCGACATAGCTCATATAAAAGCACAAAGCACGATTTTAAATCCTTCTTTTACTTT
>lcl|AL111168.1_cds_CAL34183.1_2 [gene=dnaN] [locus_tag=Cj0002] [db_xref=EnsemblGenomes-Gn:Cj0002,EnsemblGenomes-Tr:CAL34183,GOA:Q0PCC3,InterPro:IPR001001,InterPro:IPR022634,InterPro:IPR022635,InterPro:IPR022637,UniProtKB/TrEMBL:Q0PCC3] [protein=DNA polymerase III, beta chain] [protein_id=CAL34183.1] [location=1483..2550] [gbkey=CDS]
ATGAAGTTAAGTATCAATAAAAATACTTTAGAATCTGCAGTGATTTTATGTAATGCTTATGTAGAAAAAAAAGACTCAAG
CACCATTACTTCTCATCTTTTTTTTCATGCTGATGAAGATAAACTTCTTATTAAAGCTAGTGATTATGAAATAGGTATCA
ACTATAAAATAAAAAAAATCCGCGTAGAATCAAGTGGTTTTGCTACTGCAAATGCAAAAAGTATTGCAGATGTTATTAAA
AGCTTAAACAATGAAGAAGTTGTTTTAGAAACCATTGATAATTTTTTATTTGTAAGACAAAAAAGTACAAAATACAAACT

. . .

linux genome • 387 views
ADD COMMENTlink modified 19 months ago by Bastien HervĂ©4.9k • written 19 months ago by dod0
1

I've looked into the previous posts related to this topic, but the awk did not work.

What do you mean by this?

ADD REPLYlink written 19 months ago by Sej Modha4.7k

I added markup to your post for increased readability. You can do this by selecting the text and clicking the 101010 button. When you compose or edit a post that button is in your toolbar, see image below:

101010 Button

ADD REPLYlink written 19 months ago by WouterDeCoster44k

biopython is a solution

ADD REPLYlink written 19 months ago by Bastien Hervé4.9k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1489 users visited in the last hour