How to know which reads were not aligned to a reference genome?
1
0
Entering edit mode
6.6 years ago
Charles Yin ▴ 180

Read pair-end fastq files for a bacterial strain can be aligned to reference genomes by BWA. Using bedtools, the coverage of the reads on reference genome can be computed using the option 'coverageBed'. I think the coverage means how many reads were aligned to each position of the reference genome. I have new question about the coverage analysis, how do we know which reads were not aligned on the reference genome? Because the sequenced genome may be larger than reference genomes, or has rearrangements or duplicated regions, some reads may not find corresponding regions in reference genome. Is there any tools that can find unaligned reads? Thanks!

sequence SNP alignmet • 1.9k views
ADD COMMENT
1
Entering edit mode
6.6 years ago
bk11 ★ 2.3k

You can use samtools for find unaligned reads.

https://davetang.org/wiki/tiki-index.php?page=SAMTools

samtools view -b -f 4 file.bam > unmapped.bam

samtools view -b -F 4 file.bam > mapped.bam

ADD COMMENT
0
Entering edit mode

Great, that is it! Thank you!

ADD REPLY

Login before adding your answer.

Traffic: 1874 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6