Extract several parts from fasta header
0
0
Entering edit mode
5.1 years ago
rah ▴ 20

I'm looking for a way to create an text file containing some information about sequence reads, extracted from a .fasta file. Either by using grep, sed or awk.

Basically i have several fasta sequences which i have trimmed, so i an example of a header for a trimmed fasta file with a sequence where i have the original as well as the trimmed length

>ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e runid=f51153f9c3ec50d37d212f8f83dc387ac416f3c9 read=3826 ch=60 start_time=2018-11-21T16:47:21Z barcode=barcode01 trim=0-1060

So the information i want from this header is the:

read name ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e

original read length; 3826

trimmed length: 0-1600

So far i've done this part

grep -o -E "^>\w+|.read=\w+|.trim=\w+" test.fasta

Which yields the output

>ca51a0fa
read=3826
trim=0

What im looking for, would either be this

>ca51a0fa
read=3826
trim=0-1060

Or this

>ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e
read=3826
trim=0-1060

And I can't really get it to work, would any of you have a suggestion. Thanks

fasta grep sed unix bash • 1.7k views
ADD COMMENT
1
Entering edit mode

Why not use awk, delimit on space and then print the fields you need?

ADD REPLY
0
Entering edit mode

Because i didn't think of that, all of the examples i could find handling fasta headers was using grep, so i thought i might as well stay with using grep. well that worked perfectly, thanks

ADD REPLY
0
Entering edit mode
$ grep -o -E "^>\w+|.read=\w+|.trim=\w+\W\w+" test.txt
>ca51a0fa
 read=3826
 trim=0-1060


$ grep -Eio ">(\w+\W){5}|read=\w+|trim=\w+\W\w+" test.txt
>ca51a0fa-e6e5-4fd7-bd00-91cba70ca87e 
read=3826
trim=0-1060
ADD REPLY
0
Entering edit mode

Thanks for your suggestions for both options.

ADD REPLY
0
Entering edit mode

SEDA (https://www.sing-group.org/seda/) has an operation to process FASTA headers and do this type of things. It is called 'Rename header' (https://www.sing-group.org/seda/manual/operations.html#rename-header) and may be useful to you. You do not even need to install SEDA, you can use the Docker image with the latest version available at Docker Hub (https://hub.docker.com/r/pegi3s/seda/). Regards!

ADD REPLY
0
Entering edit mode

It looks really useful. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 2952 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6