Question

Finding Common Reads Across Multiple Fastq Files

0

Entering edit mode

12.3 years ago

Abhi ★ 1.6k

Hi All

We have some metagenome samples(multiple illumina lanes). What I would like to do is find out % of reads that are common amongst these fastq's allowing upto #N mismatches.

I think I can take a subsample of the reads from each fastq/bin and compare them but just wondering if there is a slick approach to do the comparison.

Thanks! -Abhi

fastq • 4.2k views

ADD COMMENT • link updated 12.3 years ago by Manu Prestat 4.1k • written 12.3 years ago by Abhi ★ 1.6k

0

Entering edit mode

Do you allow difference of quality?

ADD REPLY • link 12.3 years ago by Manu Prestat 4.1k

0

Entering edit mode

@Manu : For now I dint think about it. I was just wondering if we can comapre the reads at base level and allowing 2-4 mismatches between the reads should cover for difference in quality scores.

ADD REPLY • link 12.3 years ago by Abhi ★ 1.6k

score 3 · Answer 1 · 2012-01-19

3

Entering edit mode

12.3 years ago

Mikael Huss 4.8k

I'd start by looking at the tools contained in vmatch. There are probably many ways to approach this problem but it seems sensible to use some kind of indexing on the fastq files prior to doing the comparisons.

ADD COMMENT • link 12.3 years ago by Mikael Huss 4.8k

0

Entering edit mode

neat software, did not about it before

ADD REPLY • link 12.3 years ago by Istvan Albert 100k

score 1 · Answer 2 · 2012-01-19

1

Entering edit mode

12.3 years ago

Manu Prestat 4.1k

cd-hit would do what you want. I just learnt by the way that you can directly use fastq file as input. You can also take a look at uclust (usearch).

ADD COMMENT • link 12.3 years ago by Manu Prestat 4.1k