Why all those reads in bam files were unmapped?
2
0
Entering edit mode
2.4 years ago
Wang • 0
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580717/scrEXT030_hg19_S11_L001.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580718/scrEXT030_hg19_S11_L002.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580719/scrEXT030_hg19_S11_L003.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580720/scrEXT030_hg19_S11_L004.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580721/scrEXT030_hg19_S11_L005.bam
ftp://ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580722/scrEXT030_hg19_S11_L006.bam

For those bam files, samtools output all reads as unmapped. When I checked the bam files, I found the flag in all reads were 4.

samtools view -f 4 scrEXT030_hg19_S15_L002.bam | cut -f1 > S15_L002_unmapped_reads.txt

Why all those reads in bam files were unmapped?

samtools single-cell sequencing • 2.1k views
ADD COMMENT
3
Entering edit mode
2.4 years ago

The reads are probably in uBAM (Unmapped BAM) file format. It is a way to store raw reads that is sometimes preferred to fastq as it allows you to attach metadata to the reads.

ADD COMMENT
0
Entering edit mode

Thank you for your timely reply, how can we extract unmapped reads from those uBAM related fastq files?

ADD REPLY
2
Entering edit mode

samtools fastq. Make sure the BAM files are collated or name sorted before running the conversion.

ADD REPLY
0
Entering edit mode

My aim is to extract the unmapped reads without the host sequence. Now there is no way to extract the unmapped reads from uBAM is there? I can only transform these uBAMs into fastq and then use bowtie2 to remove the host sequences? Could you kindly add a line of samtools fastq code to clarify your point?

ADD REPLY
1
Entering edit mode

Correct. If there are no alignments included (as seems to be the case) then you will need to do alignments yourself to extract the reads. One way to do that would be to bin the reads using bbsplit.sh from BBMap suite (example BBSplit syntax for generating builds for the reference genome and how to call different builds. replace with your genomes).

samtools fastq is described in manual page here.

ADD REPLY
0
Entering edit mode

they should really name them .ubam. calling them .bam only leads to confusion

ADD REPLY
1
Entering edit mode
2.4 years ago
ATpoint 82k

Could well be unmapped BAM files, so the reads were stored in unaligned BAM format rather than fastq. That is not unusual. You will need to check whether there are some information about the processing of these files available. The hg19 indeed suggests some alignment, but file names are usually poor sources of information.

ADD COMMENT
0
Entering edit mode

Thank you. Do you know how can we extract unmapped reads from those uBAM files or their related fastq

ADD REPLY
0
Entering edit mode
ADD REPLY

Login before adding your answer.

Traffic: 2936 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6