Question: Extract several parts from fasta header
0
gravatar for rah
4 months ago by
rah20
rah20 wrote:

I'm looking for a way to create an text file containing some information about sequence reads, extracted from a .fasta file. Either by using grep, sed or awk.

Basically i have several fasta sequences which i have trimmed, so i an example of a header for a trimmed fasta file with a sequence where i have the original as well as the trimmed length

>ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e runid=f51153f9c3ec50d37d212f8f83dc387ac416f3c9 read=3826 ch=60 start_time=2018-11-21T16:47:21Z barcode=barcode01 trim=0-1060

So the information i want from this header is the:

read name ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e

original read length; 3826

trimmed length: 0-1600

So far i've done this part

grep -o -E "^>\w+|.read=\w+|.trim=\w+" test.fasta

Which yields the output

>ca51a0fa
read=3826
trim=0

What im looking for, would either be this

>ca51a0fa
read=3826
trim=0-1060

Or this

>ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e
read=3826
trim=0-1060

And I can't really get it to work, would any of you have a suggestion. Thanks

bash unix sed grep fasta • 195 views
ADD COMMENTlink written 4 months ago by rah20
1

Why not use awk, delimit on space and then print the fields you need?

ADD REPLYlink modified 4 months ago • written 4 months ago by genomax70k

Because i didn't think of that, all of the examples i could find handling fasta headers was using grep, so i thought i might as well stay with using grep. well that worked perfectly, thanks

ADD REPLYlink written 4 months ago by rah20
$ grep -o -E "^>\w+|.read=\w+|.trim=\w+\W\w+" test.txt
>ca51a0fa
 read=3826
 trim=0-1060


$ grep -Eio ">(\w+\W){5}|read=\w+|trim=\w+\W\w+" test.txt
>ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e 
read=3826
trim=0-1060
ADD REPLYlink modified 4 months ago • written 4 months ago by cpad011211k

Thanks for your suggestions for both options.

ADD REPLYlink written 4 months ago by rah20

SEDA (https://www.sing-group.org/seda/) has an operation to process FASTA headers and do this type of things. It is called 'Rename header' (https://www.sing-group.org/seda/manual/operations.html#rename-header) and may be useful to you. You do not even need to install SEDA, you can use the Docker image with the latest version available at Docker Hub (https://hub.docker.com/r/pegi3s/seda/). Regards!

ADD REPLYlink written 4 months ago by Hugo150

It looks really useful. Thanks!

ADD REPLYlink written 4 months ago by rah20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1722 users visited in the last hour