Question: Identifying contaminants (cow, pig) in oral rinse exome sequencing data
0
gravatar for AbiB1
3.0 years ago by
AbiB10
United States
AbiB10 wrote:

I have exome sequencing data for oral rinse samples tested for hereditary cancers. I am looking for potential non-human (cow or pig) contaminants. What would be the best methodology to detect the contaminants? What so far what I have done is : 1. Align fastqs with human reference(bwa-mem) 2. Samtools to extract unmapped reads from aligned bam files 3. Build an index of unmapped reads assumed to be potential contaminants. [samtools view -u -f 12 -F 256] (both mates unmapped) 4. Map the unmapped reads with Cow and Pig reference (bwa-mem) 5. Extract mapped reads from this alignment 6. Confirm those which map exclusively to one ref. [Quality checks and coverages are calculated, reads with MQ 8 are considered for analysis]

Any suggestions are appreciated.

ADD COMMENTlink modified 3.0 years ago by igor7.6k • written 3.0 years ago by AbiB10
2

An alternative could be bin the reads that map to pig/cow (or you could bin them to humans) using BBSplit from BBMap. This tool is designed for this specific application.

ADD REPLYlink written 3.0 years ago by genomax65k
1
gravatar for igor
3.0 years ago by
igor7.6k
United States
igor7.6k wrote:

FastQ Screen does exactly what you want:

FastQ Screen allows you to screen a library of sequences in FastQ format against a set of sequence databases so you can see if the composition of the library matches with what you expect.

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by igor7.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1282 users visited in the last hour