Question: How to remove reads from fastq flle that match to a set of reads in my fasta file?
0
gravatar for MAPK
16 days ago by
MAPK1.1k
United States
MAPK1.1k wrote:

I have a fastq file that seems to be contaminated by some sequences contaminating my reagents during library preparation. If I know the reads that came from reagents and I have them in a fasta format, do you think I can eliminate those reads from my fastq file? I want to remove any reads contaminating my fastq file. How can I work this out?

contamination fastq fasta • 185 views
ADD COMMENTlink modified 16 days ago by WouterDeCoster24k • written 16 days ago by MAPK1.1k
1
-Assess and QC Fastq 
-Format fastq to fasta
-BLAST to reagent fasta.
-Parse blast results and fasta (from fastq), by removing hits to reagents
ADD REPLYlink modified 16 days ago • written 16 days ago by st.ph.n2.0k

I think you forgot to include a link to the program that does this.

ADD REPLYlink modified 16 days ago • written 16 days ago by genomax40k

The OP asked "How can I work this out?". The above comment illustrates a pipeline in which to complete the OP's task, so not one single program.

  1. FASTQC and quality trimmer
  2. Converter program from FASTQ to FASTA (several exist, e.g. fastxtoolkit)
  3. BLAST
  4. Several posts on Biostars are available for reference in parsing out sequences from FASTA files based on BLAST results

    I was unaware, until your answer @genomax, that the BBMap suite had this option.

ADD REPLYlink modified 16 days ago • written 16 days ago by st.ph.n2.0k

The way you wrote that made it seem like you had copied/pasted that from github description of a package :-)

BTW: @Brian includes a sequencing_artifacts.fa.gz file (in resources directory) that I assume includes contaminants (which may be seen at other places but I assume are seen at JGI).

Various things that BBMap suite can do are here, if you have not seen this post before.

ADD REPLYlink modified 16 days ago • written 16 days ago by genomax40k

Trying to avoid black box/turn key solutions, so one can learn in the process.

ADD REPLYlink written 16 days ago by st.ph.n2.0k
4
gravatar for genomax
16 days ago by
genomax40k
United States
genomax40k wrote:

By using bbduk.sh from BBMap. Provide the contaminants as a multi-fasta file with ref= option.

ADD COMMENTlink modified 16 days ago • written 16 days ago by genomax40k
0
gravatar for WouterDeCoster
16 days ago by
Belgium
WouterDeCoster24k wrote:

My NanoLyse script is written for that, using the minimap2 aligner under the hood. It's mainly intended for long reads (Oxford Nanopore/PacBio).

ADD COMMENTlink written 16 days ago by WouterDeCoster24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 873 users visited in the last hour