how to filter a fasta file with seqs ID
1
0
Entering edit mode
3.5 years ago

Hello everybody! I have to filter out some sequences from a fast file fasta is like

>0 RH1_6550
CAGCAGCCGCGGTAATACATAAACTCAAAGGAATTGACGGGCAGCAGCCGCGGTAATACGAAGGGGGCTAGCGTTGCTCGGAATTACTGGGCGTAAAGGGCGCGTAGGCGGACATTTAAGTCAGGGGTGAAATCCCAGAGCTCAACTCTGGAACTGCCTTTGATACTGGGTGTCTTGAGTGTGAGAGAGGTATGTGGAACTCCGAGTGTAGAGGTGAAATTCGTAGATATTCGGAAGAACACCAGTGGCGAAGGCGACATACTGGCTCATTACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGATTGCTAGTTGTCGGGCTGCATGCAGTTCGGTGACGCAGCTAACGCATTAAGCAATCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGG
>1 LH1_2535
CAGCAGCCGCGGTAATAGCAGCCGCGGTAATACGGAGGGTCCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGTTAAGCCAGATGTGAAATCCCCGGGCTCAACCTGGGAATTGCATTTGGAACTGGCGAACTAGAGTCTTGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACAAAGACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCTACTCGGAGTTTGGTGTCTTGAACACTGGGCTCTCAAGCTAACGCATTAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAAGGAATTGACG
>10 LPI1_10041
CAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGAATTACTGGGCGTAAAGCGTACGTAGGCGGATCAGAAAGTAGGGGGTGAAATCCCAGGGCTCAACCCTGGAACTGCCTCCTAAACTCCTGGTCTTGAGTTCGAGAGAGGTGAGTGGAATTCCAAGTGTAGAGGTGAAATTCGTAGATATTTGGAGGAACACCAGTGGCGAAGGCGGCTCACTGGCTCGATACTGACGCTGAGGTACGAAAGTGTGGGGAGCAAACAGGATTAGATACCCCGGTAGTCCACACCGTAAACGATGAATGCCAGTCGTCGGGCAGTATACTGTTCGGTGACACACCTAACGGATTAAGCATTCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGG
>100 RH3_70592
CAGCAGCCGCGGTAATACGGGGGGTGCGAGCGTTATTCGGAATTACTGGGCGTAAAGAGCGCGTAGGCGGTCTCTTAAGTCAGGTGTGAAAGCCCGGGGCTCAACCCCGGAAGTGCACTTGAAACTAAGAGACTTGAGTATGGGAGAGGGAAGTGGAATTCCTGGTGTAGCGGTGAAATGCGTAGATATCAGGAGGAACATCAGTGGCGAAGGCGACTTCCTGGACCAATACTGACGCTGAGGCGCGAAGGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCAGTAAACGGTGAACACTAGGTGTAGCGGGTATTGACCCCTGCTGTGCCGCAGCAAACGCATTAAGTGTTCCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAAGGAATTGACGG
>1000 SHI1_76884
CAGCAGCCGCGGTAATACAGAGGTCCCGAGCGTTGTTCGGATTCACTGGGCGTAAAGGGAGCGTAGGCGGTCGGCAAAGTCTGATGTGAAATCTCCGGGCCCAACCCGGAAACTGCATCGGATACTGGTCGGCTAGAGGATTGGAGGGGGGACTGGAATTCTCGGTGTAGCAGTGAAATGCGTAGATATCGAGAGGAACACCAGTGGCGAAGGCGAGTCCCTGGACAATTCCTGACGCTGAGGCACGAAAGCTAGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCTAGCCGTAAATGGTGCACGCTTGCTGTGGGCGGAATCGACCCCGTCCGTGGCGTAGCTAACGCGTTAAGCGTGCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGG

and I have a list of IDs to exclude (the number 0,10,100) or the entire identifier >0 RH1_6550.

I tried to use qiime1 filter_fasta.py -f inseqs.fasta -o list_filtered_seqs.fasta -s seqs_to_keep.txt but it returned empty fasta file.

Any clue of how to do it?

Thanks in advance!! Best, Chiara

rna-seq • 887 views
ADD COMMENT
0
Entering edit mode
3.5 years ago
GenoMax 141k

faSomeRecords utility from Jim Kent is one of the best options. I linked linux version but macOS is also available. Add execute permissions after download chmod a+x faSomeRecords. You will want to use -exclude option with full fasta header.

faSomeRecords - Extract multiple fa records
usage:
   faSomeRecords in.fa listFile out.fa
options:
   -exclude - output sequences not in the list file.
ADD COMMENT
0
Entering edit mode

Its not working. The out.fa generates empty file.

ADD REPLY

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6