How can I extract sequences from FASTA file using OTUs ID?
1
0
Entering edit mode
5 weeks ago

I have one FASTA file with sequences and headers, and one OTUs table. What I want to do is replace the current headers in the FASTA file with just the OTU IDs from the .txt file. The files look like something like this:

FASTA file:

>9d4544fa-322a-4fac-bcf5-f8ca410c7e8a runid=b48dcb2029ef589b8cdb55fe529d93b3baadbc7c sampleid=28012021COVID3 read=57150 ch=310 start_time=2021-01-28T18:23:35Z
TCAGTTACGTATTGCTGGTGCTGAAGAAAGTTGTCGGTGTCTTTGTGTTAACCTTTCTGTTGGTGCTGATATTGCAGTCGCCCCCACCTCCTTGGATCCGCCAGGTTAAACACCCAAGCAGACGCCGAAGATAGAGCGACAGGCAAGTAGGTTAACACAAAGACACCGACAACTTTTCTTCCAGCACCAGCAGCCCGTAACT
>a8d321b9-97f1-42aa-a17a-fa1c4609e60e runid=b48dcb2029ef589b8cdb55fe529d93b3baadbc7c sampleid=28012021COVID3 read=46034 ch=83 start_time=2021-01-28T18:24:16Z
CGTTCGGTGCGTATTGCTGGTGCTGAAGAAAGTTGTCCGGTGTCTTTGTGTTAACCTTTCTGTTGGTGCTGATATTGCGGCGTCTGGCAGGTGTTAACCTGGCCTCGAGAGAGTTTGATCCTGGCTGGGATGAACGCTGACGTGCCTAATACATGCAAGTAG
>6d97cca8-6795-4c59-b4d1-05880735b957 runid=b48dcb2029ef589b8cdb55fe529d93b3baadbc7c sampleid=28012021COVID3 read=46363 ch=227 start_time=2021-01-28T18:24:08Z
GTTCAGTGCATATTGCTGTGGTGCTGCGAAAGTTTGTGGTGTCTTTGTGTTAACGCTTTCTGTTGGTGCTGATATTGCGGCGGCGTCTGCGCGGTGTTTAACCTGGCGGATCAAGGAGGTGTTCAGCCGCACCTTCCGTTTTGACTACGCCATTACGACTGCCCAATCTGTTTTACCCTAGGCCCGATCCTTGCGGTCACGGACTTCAGGCACCCCCGGCTTTCATGGCTTGACAGGCGGTGTGTACAAGGCCCGGGAACATTCACCCGCGCCATGGCTGATGCGCGATTACTAGCGAATCCCAGCTTCACGGGTCGGGTTGCA
>772e0bef-fad4-4767-b0e6-d88e6f16b2ed runid=b48dcb2029ef589b8cdb55fe529d93b3baadbc7c sampleid=28012021COVID3 read=49763 ch=349 start_time=2021-01-28T18:24:10Z
CGTTCAGTTACGTATTGCTGGTGCTGAAGAAAGTTGTCGGTGTCTTTAGGTTAACCTTTCTGTTGGTGCTGATATTGCGGCGTCTGCTTGGGTGTTTAACCTGGCCTCGAGAAGTTTGATCCTGGCTCAGGATGAACGCTAGCTTCTGAGCTTAACATGCAGAGTCCAGGGCAGCATGGAAGAAACTGCTTCTTCTGATGGCGACAACGCACGGGTGCATGCGCGTATCAAACCTGCCTCATA
>2b673c48-69c9-48a4-995b-4ad630e23502 runid=b48dcb2029ef589b8cdb55fe529d93b3baadbc7c sampleid=28012021COVID3 read=71966 ch=124 start_time=2021-01-28T18:24:11Z

OTUs table:

OTU ID  IA1 IA2 IA3 IA4 IA5 IS6

otu500  7   10  3   0   5   0

otu502  7   8   4   0   11  1

otu504  4   5   6   2   4   2

otu505  4   4   6   2   8   3

and my desired output is that the .fasta file would look like this:

>otu500
tggggaatcttagacaatgggcgcaagcctgatctagccatgccg
>otu502
tgaggaatattggtcaatggaggcaactctgaaccagccatgccttggtcaatggaggc
>otu504
gcgaacaggattagataccctggtagtccacgccgtaaacttggtcaatggaggc
>otu505
ttggtcaatggaggcttggtcaatggaggctaccctggttaccctggt
>otu506
FASTA • 109 views
ADD COMMENT
1
Entering edit mode
5 weeks ago
GenoMax 102k

You can use one of the solutions here if you need to extract fasta sequence of specific ID's from a large file :
Extract fasta sequences based on ids file

ADD COMMENT

Login before adding your answer.

Traffic: 2706 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6