Question: How to check if a Fastq file is contaminated with other strains?
0
gravatar for askif4
23 days ago by
askif40
askif40 wrote:

I have some Fastq files of the mouse(later mapped to the B6=mm10 reference sequence).

But when I looked at bam files with IGV, some reads were found out to be Rat's (https://www.ncbi.nlm.nih.gov/assembly/GCF_000001895.5)

I have used web Blastn to check some of the reads but it is impossible to check all the reads one by one.

I installed Blastn for Linux but I couldn't figure out how to use it for comparing with limited reference sequences (In my case, I want to compare the reads with only mouse and rat reference sequences)

If you could help me, I would be grateful.

Thank you.

ADD COMMENTlink written 23 days ago by askif40
1

But when I looked at bam files with IGV, some reads were found out to be Rat's

How did you decide that? Did you realign the data to rat genome or just by selecting Rat genome instead of Mouse in IGV? I am surprised IGV allowed you to choose an unrelated genome to view an alignment.

Rat is not a mouse strain but a different species.

ADD REPLYlink modified 23 days ago • written 23 days ago by GenoMax95k

Ah, I was searching for the mutation spots with low VAFs. And I found some suspicious reads. So I copied the read sequences and pasted them into web Blastn. That's how I found out that it was contaminated.

It was like this

https://ibb.co/5jD8qtV

ADD REPLYlink modified 23 days ago • written 23 days ago by askif40
1

Maybe they indeed share some sequences. You can check by mapping reads to mouse and rat ref seqs using bowtie2 , blastn is too slow.

ADD REPLYlink written 23 days ago by shenwei3565.7k

Thank you for your reply, I will try

ADD REPLYlink written 23 days ago by askif40
1

In addition to what was said, you might also consider to use the BlobToolKit pipeline (paper). I never use it. I just read the paper, but it seems that in case of contamination, it can provide useful insights. Though the other options mentioned seem to be more straightforward to follow.

ADD REPLYlink written 23 days ago by antonioggsousa1.9k
1
gravatar for cpad0112
23 days ago by
cpad011214k
Hyderabad India
cpad011214k wrote:

Use fastqscreen.

  1. Download the genomes of suspect organisms.
  2. Index them
  3. Use fastqscreen to check the contamination.

By default, fastqscreen, checks for few model genomes and contaminating vector sequences. One can supply genomes and sequences externally and check for contamination. In general, fastqscreen checks for few reads and one increase this number.

ADD COMMENTlink modified 23 days ago • written 23 days ago by cpad011214k

Thank you! I would try that tool

ADD REPLYlink written 23 days ago by askif40

This worked perfectly! Thank you again

ADD REPLYlink written 22 days ago by askif40
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2559 users visited in the last hour
_