Question: How to remove reads from fastq flle that match to a set of reads in my fasta file?
0
gravatar for MAPK
6 months ago by
MAPK1.2k
United States
MAPK1.2k wrote:

I have a fastq file that seems to be contaminated by some sequences contaminating my reagents during library preparation. If I know the reads that came from reagents and I have them in a fasta format, do you think I can eliminate those reads from my fastq file? I want to remove any reads contaminating my fastq file. How can I work this out?

contamination fastq fasta • 614 views
ADD COMMENTlink modified 6 months ago by WouterDeCoster30k • written 6 months ago by MAPK1.2k
1
-Assess and QC Fastq 
-Format fastq to fasta
-BLAST to reagent fasta.
-Parse blast results and fasta (from fastq), by removing hits to reagents
ADD REPLYlink modified 6 months ago • written 6 months ago by st.ph.n2.3k

I think you forgot to include a link to the program that does this.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax52k

The OP asked "How can I work this out?". The above comment illustrates a pipeline in which to complete the OP's task, so not one single program.

  1. FASTQC and quality trimmer
  2. Converter program from FASTQ to FASTA (several exist, e.g. fastxtoolkit)
  3. BLAST
  4. Several posts on Biostars are available for reference in parsing out sequences from FASTA files based on BLAST results

    I was unaware, until your answer @genomax, that the BBMap suite had this option.

ADD REPLYlink modified 6 months ago • written 6 months ago by st.ph.n2.3k

The way you wrote that made it seem like you had copied/pasted that from github description of a package :-)

BTW: @Brian includes a sequencing_artifacts.fa.gz file (in resources directory) that I assume includes contaminants (which may be seen at other places but I assume are seen at JGI).

Various things that BBMap suite can do are here, if you have not seen this post before.

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax52k

Trying to avoid black box/turn key solutions, so one can learn in the process.

ADD REPLYlink written 6 months ago by st.ph.n2.3k
4
gravatar for genomax
6 months ago by
genomax52k
United States
genomax52k wrote:

By using bbduk.sh from BBMap. Provide the contaminants as a multi-fasta file with ref= option.

ADD COMMENTlink modified 6 months ago • written 6 months ago by genomax52k
0
gravatar for WouterDeCoster
6 months ago by
Belgium
WouterDeCoster30k wrote:

My NanoLyse script is written for that, using the minimap2 aligner under the hood. It's mainly intended for long reads (Oxford Nanopore/PacBio).

ADD COMMENTlink written 6 months ago by WouterDeCoster30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 454 users visited in the last hour