Extracting paired unmapped reads into fastq gives empty file
0
0
Entering edit mode
8.1 years ago
vjmorley ▴ 30

I am working with paired-end Illumina Hiseq data. My goal is to create a new set of fastq files containing only paired reads that do not map to an E. coli reference.

I created an alignment to an E. coli reference, then extracted unmapped paired reads to a new file:

samtools view -u -f 12 ecoli_sam.sam > unmapped.bam

I sorted to maintain the order of paired reads

samtools sort -n unmapped.bam sorted_unmapped

And converted to fastq

bamToFastq -useTags -bam sorted_unmapped.bam -fq1 unmappedR1.fastq -fq2 unmappedR2.fastq

When I view the resulting fastq files, the first file looks normal (unmappedR1.fastq), but the paired file (unmappedR2.fastq) has only the read headers but not sequence. An example is below.

@HISEQ:242:H32LYADXX:2:1101:1272:81236/2

+

@HISEQ:242:H32LYADXX:2:1101:1273:64750/2

+

Can someone tell me where I'm going wrong?

next-gen alignment • 2.8k views
ADD COMMENT
0
Entering edit mode

could you run the following:

samtools view unmapped.bam | grep "HISEQ:242:H32LYADXX:2:1101:1272:81236"

and post the results?

ADD REPLY
1
Entering edit mode

Sure, here's the output:

$ samtools view unmapped.bam | grep "HISEQ:242:H32LYADXX:2:1101:1272:81236"
HISEQ:242:H32LYADXX:2:1101:1272:81236   77  *   0   0   *   TATTNTCAAAAGATTTATTCTTTTTGACCTTTATACCAAACTGTTCAGCATATTCAGCTAATTTAGCC    @@@D#2ABFDB?BGHIGIIB@?<4C:EADFEGIHGEEEIFII>DFGGEEHIGC?9D<DGHBHCEE>CH
HISEQ:242:H32LYADXX:2:1101:1272:81236   141 *   0   0   *   GAAAATCTGTTATTGGGAATTGTTCTGGTGCATTAAGTTGCGGGGCAAGGCGGGCTTGGTAGCGATAGTTAAGAGN    (0;5@#######################################################################
ADD REPLY
0
Entering edit mode

That is weird. You are supposed to have at least 11 fields: https://samtools.github.io/hts-specs/SAMv1.pdf

ADD REPLY
0
Entering edit mode

Try using 13 as the SAM flag, this specifies that the read and its mate are unmapped and that they are paired.

ADD REPLY
0
Entering edit mode

I just gave this a try, and unfortunately using the 13 flag still gives the same incomplete fastq file for the paired reads.

ADD REPLY
0
Entering edit mode

Maybe the error is that you are passing in a SAM file to tools which require a BAM file? Your first filtering command will produce SAM output, but you've given it a BAM extension. Maybe try filtering again using the -b flag to ensure a BAM format is output?

ADD REPLY
0
Entering edit mode

Hm, still doesn't seem to be solving the problem. I tried using a BAM file as input rather than a SAM file and changing the -u flag to a -b flag, but I'm still getting the same result.

ADD REPLY
0
Entering edit mode

EDIT: my mistake these are unmapped reads, ignore this comment

Have you done any type of filtering beforehand, such as by mapping quality? If so you'll end up with paired reads where one of the mates has been removed, but the mates flag is still set to being paired. Run the samtools fixmate command on your file and then try your steps again.

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6