Question

How to remove reads from fastq flle that match to a set of reads in my fasta file?

1

Entering edit mode

7.5 years ago

MAPK ★ 2.1k

I have a fastq file that seems to be contaminated by some sequences contaminating my reagents during library preparation. If I know the reads that came from reagents and I have them in a fasta format, do you think I can eliminate those reads from my fastq file? I want to remove any reads contaminating my fastq file. How can I work this out?

fastq fasta contamination • 5.9k views

ADD COMMENT • link updated 7.5 years ago by WouterDeCoster 48k • written 7.5 years ago by MAPK ★ 2.1k

1

Entering edit mode

-Assess and QC Fastq 
-Format fastq to fasta
-BLAST to reagent fasta.
-Parse blast results and fasta (from fastq), by removing hits to reagents

ADD REPLY • link 7.5 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

I think you forgot to include a link to the program that does this.

ADD REPLY • link 7.5 years ago by GenoMax 152k

0

Entering edit mode

The OP asked "How can I work this out?". The above comment illustrates a pipeline in which to complete the OP's task, so not one single program.

FASTQC and quality trimmer
Converter program from FASTQ to FASTA (several exist, e.g. fastxtoolkit)
BLAST
Several posts on Biostars are available for reference in parsing out sequences from FASTA files based on BLAST results

I was unaware, until your answer @genomax, that the BBMap suite had this option.

ADD REPLY • link 7.5 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

The way you wrote that made it seem like you had copied/pasted that from github description of a package :-)

BTW: @Brian includes a sequencing_artifacts.fa.gz file (in resources directory) that I assume includes contaminants (which may be seen at other places but I assume are seen at JGI).

Various things that BBMap suite can do are here, if you have not seen this post before.

ADD REPLY • link 7.5 years ago by GenoMax 152k

0

Entering edit mode

Trying to avoid black box/turn key solutions, so one can learn in the process.

ADD REPLY • link 7.5 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

Hi did you find the solution for that? If my contamination reads and true reads both are in fastq file then how to remove those reads ?

ADD REPLY • link 6.3 years ago by jeccy.J ▴ 60

0

Entering edit mode

I gave you two additional answers in other thread you posted this in: C: Subtracting one FASTAq file Reads from other FASTAq reads

ADD REPLY • link 6.3 years ago by GenoMax 152k

0

Entering edit mode

7.5 years ago

WouterDeCoster 48k

My NanoLyse script is written for that, using the minimap2 aligner under the hood. It's mainly intended for long reads (Oxford Nanopore/PacBio).

ADD COMMENT • link 7.5 years ago by WouterDeCoster 48k

score 6 · Accepted Answer · 2018-01-02

6

Entering edit mode

7.5 years ago

GenoMax 152k

By using bbduk.sh from BBMap. Provide the contaminants as a multi-fasta file with ref= option.

ADD COMMENT • link 7.5 years ago by GenoMax 152k