grep sequence
1
0
Entering edit mode
28 days ago

Hi,

I have a fasta file with sequences like the following. The pair of sequences have a similar header. I want to generate a file with the sequences which have a header with no "shuffled". How to do that in bash?

>AABR03119176.1/72910-72785
UCCCCCAGAGUCUGGGCUUGGUGCUUUGCAGUGCUGGCGACCUAUUCCCUUUGACGAUCCCUAGGUGGAGAUGGGGCAUGAGGAUCCUCCAGGGGAAUAGCUCACCGCCACUGGGCAACAGGCCUA
>AABR03119176.1/72910-72785-shuffled
CCGCUAGCGUGAUUGGGGACGGGAUCGACCGGUGGCCCGCCGACGCCUCACCUCAUACUCGUAUGUGAUGCCGAGGGCUAGGUAAGAUGGUUGAACGCUCUAGAGUGCCCUCUGAACUUAGCCUCU
>AANN01820944.1/1549-1423
UUUCCCUCAGAAUAGGCUUGUUGCUUUACAGUACUGGUGAUCCAUUCUCUUUGAUGAUCCCcUAGGUGGAGAUGGGGCAUGAGGAUCCUCCAAGGGAAAGACUCAUCAUCACUGGGCAACAGCCUUA
>AANN01820944.1/1549-1423-shuffled
AGGCUCUGACAUAGACUCUUCUUUAGUGGGCGCGCCGACACAUACCUGUcUGAGGAGAUCGAAAUGUGUAGUCCGACAGAACUAAACAAGACUCGUCGGUGCUUAGACUUCUUUCCUGUUUGCGAUU
grep • 177 views
ADD COMMENT
0
Entering edit mode

try these:

$ sed '/^>/ s/-shuffled$//' test.fa or

$ awk -F "-shuffled" '{print $1}' test.fa or

$ awk -v RS=">" -v OFS="\n" 'NR>1 {sub("-shuffled$","",$1); print ">"$1,$2}' test.fa.

But you will have sequences with identical headers. Somewhere else, this could be a problem.

ADD REPLY
0
Entering edit mode
28 days ago
cat <yourFile> | paste - - | grep -v 'shuffled' | sed 's/\t/\n/g' > new_file

cat your file, put header and sequence on one line (paste) , grep all lines that do not match 'shuffled' (grep -v ) , put data back in two lines header+sequence (sed)

ADD COMMENT
0
Entering edit mode

as an additional note I want to add that I provided a working solution here but that you could have found this yourself doing some searching as this has been asked/answered a number of times before.

ADD REPLY
0
Entering edit mode

Thanks but an Error is given. How to solve it?

sed: -e expression #1, char 6: unterminated `s' command
ADD REPLY
0
Entering edit mode

apologies for that, it was missing a trailing /, fixed it in the cmdline above

ADD REPLY

Login before adding your answer.

Traffic: 2036 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6