Question: Extracting paired unmapped reads into fastq gives empty file
0
gravatar for vjmorley
3.1 years ago by
vjmorley30
United States
vjmorley30 wrote:

I am working with paired-end Illumina Hiseq data. My goal is to create a new set of fastq files containing only paired reads that do not map to an E. coli reference.

I created an alignment to an E. coli reference, then extracted unmapped paired reads to a new file:

samtools view -u -f 12 ecoli_sam.sam > unmapped.bam

I sorted to maintain the order of paired reads

samtools sort -n unmapped.bam sorted_unmapped

And converted to fastq

bamToFastq -useTags -bam sorted_unmapped.bam -fq1 unmappedR1.fastq -fq2 unmappedR2.fastq

When I view the resulting fastq files, the first file looks normal (unmappedR1.fastq), but the paired file (unmappedR2.fastq) has only the read headers but not sequence. An example is below.

@HISEQ:242:H32LYADXX:2:1101:1272:81236/2

+

@HISEQ:242:H32LYADXX:2:1101:1273:64750/2

+

Can someone tell me where I'm going wrong?

alignment next-gen • 1.1k views
ADD COMMENTlink modified 3.0 years ago by Biostar ♦♦ 20 • written 3.1 years ago by vjmorley30

could you run the following:

samtools view unmapped.bam | grep "HISEQ:242:H32LYADXX:2:1101:1272:81236"

and post the results?

ADD REPLYlink written 3.1 years ago by Gabriel R.2.6k
1

Sure, here's the output:

$ samtools view unmapped.bam | grep "HISEQ:242:H32LYADXX:2:1101:1272:81236"
HISEQ:242:H32LYADXX:2:1101:1272:81236   77  *   0   0   *   TATTNTCAAAAGATTTATTCTTTTTGACCTTTATACCAAACTGTTCAGCATATTCAGCTAATTTAGCC    @@@D#2ABFDB?BGHIGIIB@?<4C:EADFEGIHGEEEIFII>DFGGEEHIGC?9D<DGHBHCEE>CH
HISEQ:242:H32LYADXX:2:1101:1272:81236   141 *   0   0   *   GAAAATCTGTTATTGGGAATTGTTCTGGTGCATTAAGTTGCGGGGCAAGGCGGGCTTGGTAGCGATAGTTAAGAGN    (0;5@#######################################################################
ADD REPLYlink written 3.1 years ago by vjmorley30

That is weird. You are supposed to have at least 11 fields: https://samtools.github.io/hts-specs/SAMv1.pdf

ADD REPLYlink written 3.1 years ago by Gabriel R.2.6k

Try using 13 as the SAM flag, this specifies that the read and its mate are unmapped and that they are paired.

ADD REPLYlink written 3.1 years ago by James Ashmore2.6k

I just gave this a try, and unfortunately using the 13 flag still gives the same incomplete fastq file for the paired reads.

ADD REPLYlink written 3.1 years ago by vjmorley30

Maybe the error is that you are passing in a SAM file to tools which require a BAM file? Your first filtering command will produce SAM output, but you've given it a BAM extension. Maybe try filtering again using the -b flag to ensure a BAM format is output?

ADD REPLYlink written 3.1 years ago by James Ashmore2.6k

Hm, still doesn't seem to be solving the problem. I tried using a BAM file as input rather than a SAM file and changing the -u flag to a -b flag, but I'm still getting the same result.

ADD REPLYlink written 3.1 years ago by vjmorley30

EDIT: my mistake these are unmapped reads, ignore this comment

Have you done any type of filtering beforehand, such as by mapping quality? If so you'll end up with paired reads where one of the mates has been removed, but the mates flag is still set to being paired. Run the samtools fixmate command on your file and then try your steps again.

ADD REPLYlink modified 3.1 years ago • written 3.1 years ago by James Ashmore2.6k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 904 users visited in the last hour