Extract sequence ID from Fasta for a given sequence
2
0
Entering edit mode
3.1 years ago
MAPK ★ 1.9k

I have a fasta file myfasta.fasta like this:

>aat.2.2344.a
ATTGCCGGTTTAATATTA
>aat.2.d2344.acc
ATTGCCGGTTTAATAAA
>aat.2.2bb344.a
ATTGCCGGTTTAATAGGAGAGAATT
>aat.2.2ccc344.a
ATTGCCGGTTTAATAGGGAG
>aat.2.2344.acc
ATTGCCGGTTTAATAAA

I also have a text file my.txt which contains the sequence that matches the sequence in fasta file above:

ATTGCCGGTTTAATAAA

Based on this sequence, I want to extract all matched IDs for this sequence. Can someone please help me with this? Thanks!

The result I want is:

>aat.2.2344.acc
>aat.2.d2344.acc
Fasta • 797 views
ADD COMMENT
1
Entering edit mode

Are the sequences all one line? If so you can just use grep -B 1 ...

ADD REPLY
0
Entering edit mode

Yes they are 50 bps reads.

ADD REPLY
1
Entering edit mode

Dear MAPK, if you usually work with FASTA files you may find SEDA (http://www.sing-group.org/seda/) an useful tool. It has a great variety of operations to manipulate, filter, and transform FASTA files (check out the manual to see all of them: https://www.sing-group.org/seda/manual/index.html). It also allows you to explore a set of FASTA files and extract only the information you need, such as the sequence identifiers (see https://www.sing-group.org/seda/manual/graphical-user-interface.html#the-input-area).

With best regards, Hugo.

ADD REPLY
2
Entering edit mode
3.1 years ago
Joe 19k

This works:

grep --no-group-separator -B 1 -F -f my.txt 344194.fasta | grep -v -f my.txt

Only downside is reading the my.txt file twice. There are other non-grep approaches that could avoid this but this is simple.

ADD COMMENT
2
Entering edit mode
3.1 years ago
michael.ante ★ 3.7k

Hi MAPK,

I find fasgrep from the FAST suit quite handy:

fasgrep -s ATTGCCGGTTTAATAAA myfasta.fasta

It reports the full entry, thus you can just grep for ^\>.

[EDIT] since you have multiple sequences (as I have read right now) you can provide these as regex: "ATTGCCGGTTTAATAAA|CCCCGCGC|ATATATATA"

Cheers,

Michael

ADD COMMENT

Login before adding your answer.

Traffic: 2773 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6