Question

Identifying contaminants (cow, pig) in oral rinse exome sequencing data

0

Entering edit mode

9.6 years ago

AbiB1 • 0

I have exome sequencing data for oral rinse samples tested for hereditary cancers. I am looking for potential non-human (cow or pig) contaminants. What would be the best methodology to detect the contaminants? What so far what I have done is : 1. Align fastqs with human reference(bwa-mem) 2. Samtools to extract unmapped reads from aligned bam files 3. Build an index of unmapped reads assumed to be potential contaminants. [samtools view -u -f 12 -F 256] (both mates unmapped) 4. Map the unmapped reads with Cow and Pig reference (bwa-mem) 5. Extract mapped reads from this alignment 6. Confirm those which map exclusively to one ref. [Quality checks and coverages are calculated, reads with MQ 8 are considered for analysis]

Any suggestions are appreciated.

Exome genome sequencing blast alignment • 2.5k views

ADD COMMENT • link updated 9.6 years ago by igor 13k • written 9.6 years ago by AbiB1 • 0

3

Entering edit mode

An alternative could be bin the reads that map to pig/cow (or you could bin them to humans) using BBSplit from BBMap. This tool is designed for this specific application.

ADD REPLY • link 9.6 years ago by GenoMax 154k

score 2 · Answer 1 · 2016-04-23

2

Entering edit mode

9.6 years ago

igor 13k

FastQ Screen does exactly what you want:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

ADD COMMENT • link 9.6 years ago by igor 13k