Question: Collect Read Pairs Where At Least One Read Is Mapped
2
gravatar for fridhackery
5.5 years ago by
fridhackery140
fridhackery140 wrote:

I might word my initial question like another post, but I really have the opposite meaning, i think: Filtering multiple flags with SAMtools

I am trying to remove paired-end reads from a .SAM file where neither segment is mapped

But by "remove" I don't mean collect. I want all the read pairs where the forward read OR the reverse read OR both reads are mapped then I will use bam2fastq to get the reads and assemble.

I think there are pieces missing from my reference. I will use all these reads to try to assemble a better reference. So if one read maps to the reference, but its pair does not, that is a good read for me; the reverse read is possibly part of the sequence that is missing from my reference.

What SAM flags should I set?

paired-end samtools sam • 4.3k views
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by fridhackery140
2
gravatar for fridhackery
5.5 years ago by
fridhackery140
fridhackery140 wrote:

matted almost got it. This is what worked:

# exactly one end mapped
samtools view -bh -F 4 -f 8 in.bam > out1.bam
samtools view -bh -F 8 -f 4 in.bam > out2.bam

# exactly two ends mapped
samtools view -bh -F 12 in.bam > out3.bam

then merge and sort -n

Thank you all!

ADD COMMENTlink written 5.5 years ago by fridhackery140
1
gravatar for Pierre Lindenbaum
5.5 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum115k wrote:

I wrote a java tool filtering SAM/BAm file using javascript (rhino). It's available at: https://github.com/lindenb/jvarkit#samjs-filtering-a-sambam-file-with-javascript )

To get "the reads unmapped or the mate unmapped" you can filter the BAM with:

java -jar dist/samjs.jar I=input.bam \
    VALIDATION_STRINGENCY=SILENT \
    SCRIPT_EXPRESSION="record.readUnmappedFlag || record.mateUnmappedFlag;" \
    OUT=result.bam

and then get back the FASTQs using picard/samToFastq.jar

To get "exactly one end or two ends mapped" you can filter the BAM with:

java -jar dist/samjs.jar I=input.bam \
    VALIDATION_STRINGENCY=SILENT \
    SCRIPT_EXPRESSION="!(record.readUnmappedFlag && record.mateUnmappedFlag);" \
    OUT=result.bam
ADD COMMENTlink modified 5.5 years ago • written 5.5 years ago by Pierre Lindenbaum115k
0
gravatar for Ashutosh Pandey
5.5 years ago by
Philadelphia
Ashutosh Pandey11k wrote:

This command will remove the unmapped reads and will output all the mapped reads into a new.bam file.

samtools view -bh -F 4 original.bam -> new.bam

-F option will skip the alignment that match a given flag and -f option will include those alignments.

ADD COMMENTlink written 5.5 years ago by Ashutosh Pandey11k

This won't do what he wants, since he wants to convert the bam back to fastq (with both end of a read pair present) for use in assembly. This will make a bam with many reads with missing mates.

ADD REPLYlink written 5.5 years ago by matted7.0k

I don't think so. That is what I had tried first, and again upon your suggestion. The bam2fastq output says: [samopen] SAM header is present: 7 sequences. This looks like paired data from lane 198. Output will be in IMmitoJF4_1.fq and IMmitoJF4_2.fq 15962489 sequences in the BAM file 15962489 sequences exported WARNING: 8855375 reads could not be matched to a mate and were not exported

However, I expect every read in the new.bam file to be paired. Since, all pairs in, then select every pair where at least one read is mapped, means all pairs out.

I did get all pairs out when I followed Filtering multiple flags with SAMtools, but they were the unmapped pairs.

ADD REPLYlink written 5.5 years ago by fridhackery140
0
gravatar for matted
5.5 years ago by
matted7.0k
Boston, United States
matted7.0k wrote:

Since it sounds like you want both reads of a pair that meet your conditions, I think you'll have to break it into several steps:

# exactly one end mapped
samtools view -b -F 4 -f 8 in.bam > out1.bam

# exactly two ends mapped
samtools view -b -F 12 in.bam > out2.bam

You can convert out1.bam and out2.bam to paired-end fastq files for use in assembly. Each read present in these bams should also have its mate present in the bam.

See http://picard.sourceforge.net/explain-flags.html for an easy way to play with flag settings. You may need to also filter out something like -F 1792 (no secondary alignments, failed reads, or PCR duplicates), depending on what flags the aligner set in generating in.bam.

ADD COMMENTlink written 5.5 years ago by matted7.0k

I tried this approach but there are still reads that are missing mates.

bam2fastq output: This looks like paired data from lane 198. Output will be in ALLmitoKK_1.fq and ALLmitoKK_2.fq 15962489 sequences in the BAM file 15962489 sequences exported WARNING: 8855375 reads could not be matched to a mate and were not exported

ADD REPLYlink written 5.5 years ago by fridhackery140
1

Try the solution provided by Pierre. I think it should work.

ADD REPLYlink written 5.5 years ago by Ashutosh Pandey11k

Which out bam did you use here? It's suspicious that the error message / numbers you report are identical to the ones you gave for the -F 4 case above.

What aligner made the bam? Are the flags set correctly in the bam? Does bam2fastq expect a certain sort order that you're not giving it?

ADD REPLYlink modified 5.5 years ago • written 5.5 years ago by matted7.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1166 users visited in the last hour