Question: Remove part of the header from multi-fasta file (another one)
0
gravatar for macielrodriguez2
10 months ago by
macielrodriguez230 wrote:

Hi!!!

I have a multifasta file wih headers like:

>trnN-GUU_INIA601-ARAGORN_v1.2.38 ccsA_INIA601-blatX
>rpl16_INIA601-blatX ndhF_INIA601-blatX psbJ_INIA601-blatX
>trnW-CCA-I_INIA601-ARAGORN_v1.2.38 trnL-UAG_INIA601-ARAGORN_v1.2.38
>psaC_INIA601-blatX trnR-UCU_INIA601-ARAGORN_v1.2.38 ndhA_INIA601-blatX
>trnC-ACA_INIA601-ARAGORN_v1.2.38 trnW-CCA-II_INIA601-ARAGORN_v1.2.38

I would like some way to only leave the name of the gene, like:

>rpl16 
>trnW 
>psaC 
>trnC

Thank you so much for your kind help :)

sequence fasta gene • 213 views
ADD COMMENTlink modified 10 months ago by zx87549.7k • written 10 months ago by macielrodriguez230

with seqkit:

$ seqkit replace -p "[-_].*" -r "" input.fa

check if it makes sense to remove "_INIA601" and every thing after "_INIA601" from fasta headers.

ADD REPLYlink modified 10 months ago • written 10 months ago by cpad011214k
0
gravatar for zubenel
10 months ago by
zubenel110
zubenel110 wrote:

By looking at the file I have assumed that tRNA gene names include codon sequence and are: "trnN-GUU", "trnW-CCA-I", trnC-ACA". By omitting codon sequence you would lose information and would not distinguish some cases as "trnR-UCG" or "trnR-CCG". So if you need to extract full gene names you can use:

perl -pe 's/_.*//g' multifasta_file

This regular expression finds everything starting with _ and changes it to nothing.

Otherwise, if you want to get result as you wrote, you can use:

perl -pe 's/[_-].*//g' multifasta_file

This expression removes everything that starts with _ or -. Perl regular expressions are greedy so the longest sequence found is changed to nothing.

ADD COMMENTlink modified 10 months ago • written 10 months ago by zubenel110
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1327 users visited in the last hour