Question: QIIME filter_fasta.py not removing chimeric sequences
0
gravatar for samanthabird
10 months ago by
samanthabird0 wrote:

Hey everyone,

I am working with 16S data and I have removed my chimeric sequences using vsearch. This program outputted a .txt file with all suspect sequences. I am trying to remove these sequences from my original fasta file using qiime filter_fasta.py command.

This is what I have tried: filter_fasta.py -f <filename>.fasta -o <newfilename>.fasta -s chimeraout.txt -n

But when I grep the original fasta file and the new fasta file using the command below, they have the same number of sequences. The chimeras are not being removed from the original file.

grep "^>" <filename>.fasta | wc -l

I have tried troubleshooting this in the following ways:

First, I noticed that my original fasta had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247 1:N:0:GCGTAGTA+CGTCTAAT

while my chimeric sequence txt file had this as the header: >M00307:50:000000000-BT3VT:1:1101:15779:1247

so I edited the original fasta file to remove the barcode portion of the header. No luck.

Then, I realized that my original fasta file had the sequence outputted to one line, while my .txt file outputted as separate lines as below:

M00307:50:000000000-BT3VT:1:1101:15779:1247 TGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAAGGATGAAGGTTTTCGGATCGTAAACTTTT GTCTTAGGGGACGAGGAAGGACGGTACCCTAGGAGGAAGCCACGGCTAATTACGTGCCAGCAGCCGCGGTAACACGTAAG CCCCTAGCGTTGTTCGGAATTATTGGGCGTAAAGGGCATGTAGGCGGTCAGGCAAGTCTGGTGTGAAATCTCGTGGCTCA

so I removed the spacing and tried it again but still, nothing was being removed. If I remove the -n parameter from my command the output file is empty so I know that qiime is reading the command properly it appears to not be recognizing the chimeric sequences. Any suggestions on how I can fix this would be greatly appreciated!!

chimera qiime 16s rrna python • 417 views
ADD COMMENTlink modified 10 months ago by toralmanvar750 • written 10 months ago by samanthabird0

Have you had a look at the qiime webpage for the filter_fasta.py command?

http://qiime.org/scripts/filter_fasta.html

It looks as if the file passed with the -s parameter should just have a list of the IDs of the sequences you want to remove, rather than the actual sequences. Try and see if this helps.

ADD REPLYlink written 10 months ago by mastal5112.0k
0
gravatar for toralmanvar
10 months ago by
toralmanvar750
toralmanvar750 wrote:

Like masta|511 said, please make sure that you are using chimeric.txt file as input. Also chimeric txt header file should not start with ">" symbol.

Otherwise, the command which you are using is correct, but again looking at your fasta sequence IDs, I feel you have not run "split_library.fastq.py" command first which converts fastq into fasta and prepares the header in form of "samplename_readnumber"

For instance, if the name of your sample is 'sampleA', then while converting the file from fastq to fasta, it will modify the headers to :

sampleA_1 ATCCCCCC..... sampleA_2 TCCCCAAAA....

I run below mentioned 3 commands first before running "filter_fasta.py" command and everything runs fine:

  1. validate_mapping_file.py
  2. split_libraries_fastq.py
  3. identify_chimeric_seqs.py
ADD COMMENTlink written 10 months ago by toralmanvar750

Thank you both for your answers! I wasn't actually running through the QIIME pipeline, I had used vsearch to remove chimeras and this was the way it had outputted the text file. However, removing the chevron worked perfectly - even with the sequences still in the file! I appreciate the help!

ADD REPLYlink written 10 months ago by samanthabird0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1613 users visited in the last hour