C. elegans WGS large portion of reads flagged as unmapped with Sam flag 133 and 165
0
0
Entering edit mode
2.7 years ago

Hello,

I recently did a whole genome sequencing (PE150) experiment on 5 strains of C. elegans. I aligned everything using SNAP and after processing the data saw that across all of samples, only 33% of the reads mapped to the reference. I tried mapping to E. coli as well to ensure the remaining reads weren't from contamination, but only ~3% mapped there. When I looked at the unmapped reads using:

samtools view -f 4 bamfile.bam

I saw that the unmapped reads I looked at all have either the sam flag 133 (read paired (0x1), read unmapped (0x4), second in pair (0x80)) or 165 (read paired (0x1), read unmapped (0x4), mate reverse strand (0x20), second in pair (0x80)).

If I blast the read sequences, some have no match, but many have perfect or near perfect matches to the C. elegans genome, but only to a subset of the read, maybe 100-145ish bases. Is there a reason I would have so many of these flags in my data and is there anything I can do about it at this point to correct these unmapped reads?

EDIT: I've looked more closely at the blast results for the reads and found that there are many cases where half of the sequence matches a C. elegans sequence perfectly and the other half's best match is Diabrotica undecimpuntata virus 1, Asarum shuttleworthii chloroplast or Cyprinus carpio dna (these ones have come up several times). There are other cases where 120-140 bases in the read match a segment of C. elegans DNA and the remaining bit matches a nearby sequence, but running in the opposite direction.

Thank you,

Tyler

sam WGS flag 133 165 unmapped • 568 views
ADD COMMENT
0
Entering edit mode

Are you sure that you have filtered the data properly to remove adapter sequences and low quality bases? If yes, just to get quick idea of different contamination present in your sample, you can upload your reads in Kaiju web server.

ADD REPLY

Login before adding your answer.

Traffic: 3164 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6