Question: Identifying contaminants (cow, pig) in oral rinse exome sequencing data
gravatar for AbiB1
3.8 years ago by
United States
AbiB10 wrote:

I have exome sequencing data for oral rinse samples tested for hereditary cancers. I am looking for potential non-human (cow or pig) contaminants. What would be the best methodology to detect the contaminants? What so far what I have done is : 1. Align fastqs with human reference(bwa-mem) 2. Samtools to extract unmapped reads from aligned bam files 3. Build an index of unmapped reads assumed to be potential contaminants. [samtools view -u -f 12 -F 256] (both mates unmapped) 4. Map the unmapped reads with Cow and Pig reference (bwa-mem) 5. Extract mapped reads from this alignment 6. Confirm those which map exclusively to one ref. [Quality checks and coverages are calculated, reads with MQ 8 are considered for analysis]

Any suggestions are appreciated.

ADD COMMENTlink modified 3.8 years ago by igor9.6k • written 3.8 years ago by AbiB10

An alternative could be bin the reads that map to pig/cow (or you could bin them to humans) using BBSplit from BBMap. This tool is designed for this specific application.

ADD REPLYlink written 3.8 years ago by genomax78k
gravatar for igor
3.8 years ago by
United States
igor9.6k wrote:

FastQ Screen does exactly what you want:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by igor9.6k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1607 users visited in the last hour