Question

problem with filtering "Sequence unavailable"

0

Entering edit mode

7.4 years ago

ashkan ▴ 160

I have a file like the small example: small example:

>ENSG00000004142|ENST00000003607|POLDIP2|||2118
Sequence unavailable
>ENSG00000003056|ENST00000000412|M6PR|9099001;9102084|9099001;9102551|2756
CCAGGTTGTTTGCCTCTGGTCGGAAAGGGAAACTACCCCTGCTTCCACTCTGACAGCAGA

but I have too many "Sequence unavailable". I want to get rid of those transcripts. and the results would be like this:

>ENSG00000003056|ENST00000000412|M6PR|9099001;9102084|9099001;9102551|2756
CCAGGTTGTTTGCCTCTGGTCGGAAAGGGAAACTACCCCTGCTTCCACTCTGACAGCAGA

I tried to filter out those parts in bash but

grep -v "$(grep -B 1 "Sequence unavailable" file.txt)" file.txt

but gave this error:

Argument list too long

how can i filter out them in bash or python?

sequence • 1.8k views

ADD COMMENT • link updated 7.4 years ago by Ram 43k • written 7.4 years ago by ashkan ▴ 160

0

Entering edit mode

How about (should work as long as the first record is Sequence Unavailable, you can be creative otherwise): grep -A 2 "Sequence" your.fa | grep -v "\-\-" | sed -n '/Sequence/!p' > new.fa

ADD REPLY • link 7.4 years ago by GenoMax 141k

0

Entering edit mode

It would be nice to provide feedback to the proposed solution of genomax2. In addition, you have more questions which you left "open/unsolved" after people tried to help you. That's not respectful.

I pledged to help you on your previous thread, but my questions remain unanswered, although it's clear that you have been active multiple times on biostars since my comment. You shouldn't take our help for granted.

ADD REPLY • link 7.4 years ago by WouterDeCoster 47k

0

Entering edit mode

Dear ashkan, please respond to questions/give follow up comments on your past posts. Abandoning a question after you ask it borders on troll-like behavior. Unless you follow up on your past questions, your future questions may not be taken seriously or your posts may be treated even more sternly.

ADD REPLY • link 7.4 years ago by Ram 43k