Identifying contaminants (cow, pig) in oral rinse exome sequencing data
1
0
Entering edit mode
8.0 years ago
AbiB1 • 0

I have exome sequencing data for oral rinse samples tested for hereditary cancers. I am looking for potential non-human (cow or pig) contaminants. What would be the best methodology to detect the contaminants? What so far what I have done is : 1. Align fastqs with human reference(bwa-mem) 2. Samtools to extract unmapped reads from aligned bam files 3. Build an index of unmapped reads assumed to be potential contaminants. [samtools view -u -f 12 -F 256] (both mates unmapped) 4. Map the unmapped reads with Cow and Pig reference (bwa-mem) 5. Extract mapped reads from this alignment 6. Confirm those which map exclusively to one ref. [Quality checks and coverages are calculated, reads with MQ 8 are considered for analysis]

Any suggestions are appreciated.

Exome genome sequencing blast alignment • 2.2k views
ADD COMMENT
3
Entering edit mode

An alternative could be bin the reads that map to pig/cow (or you could bin them to humans) using BBSplit from BBMap. This tool is designed for this specific application.

ADD REPLY
2
Entering edit mode
8.0 years ago
igor 13k

FastQ Screen does exactly what you want:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

ADD COMMENT

Login before adding your answer.

Traffic: 1584 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6