Extracting unmapped paired end from an alignment
2
1
Entering edit mode
7.8 years ago
Picasa ▴ 640

Hi,

I have a list of contaminants that I want to filter out from my paired end:

bowtie2 -x contaminants -1 pair1.fastq -2 pair2.fastq -S out.sam

Now I want to extract only unmapped reads from the .sam file but I want to keep only paired end.

So if read1 map somewhere and read2 not, I want to discard these PE.

1) How can I do that ?

2) Is it possible to produce 2 sams file at the end (one for each PE) ?

Thanks

bowtie2 sam • 4.8k views
ADD COMMENT
0
Entering edit mode

Anybody with other solutions ?

ADD REPLY
1
Entering edit mode
7.8 years ago
samtools view -f 4 -bu input.bam | samtools view -f 8 -bu - | java -jar  picard-tools/picard.jar  SamToFastq  I=/dev/stdin F=out1.fq.gz F2=out2.fq.gz  FU=unpaired.fq.gz VALIDATION_STRINGENCY=LENIENT
ADD COMMENT
0
Entering edit mode

Thanks, can you explain this:

-bu - ( the "-" after bu)

and

VALIDATION_STRINGENCY=LENIENT

By the way, is it the same thing for mate pair ? (PE have FR orientation while MP have RF)

ADD REPLY
2
Entering edit mode
  • - - stdin from previous command's output

From samtools view options

  • b - output BAM file
  • u - output uncompressed BAM file

From picard manual, here

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT}

ADD REPLY
0
Entering edit mode

So it doesn't work in my case:

My output from bowtie2:

15301480 reads; of these: 15301480 (100.00%) were paired; of these: 15140326 (98.95%) aligned concordantly 0 times

With the command you provide, my unpaired.fq.gz is empty and :

grep '@' out1.fastq | wc -l 14798490

ADD REPLY
1
Entering edit mode

enter image description here :-)

ADD REPLY
0
Entering edit mode

Actually I have a lot of flags 77 and 141. Using -f 4 and -f 8 would be the same ?

ADD REPLY
0
Entering edit mode

in both 77 and 141 "read unmapped and mate unmapped", only 1st and 2nd in pairs changes. Of course, the flags 'x'-in-pair are not related to the fact that the reads are mapped or not.

ADD REPLY
0
Entering edit mode

I don't understand the discordant results between bowtie2 and these flag filtering:

5317475 reads; of these:
  15317475 (100.00%) were paired; of these:
    15155323 (98.94%) aligned concordantly 0 times
    155208 (1.01%) aligned concordantly exactly 1 time
    6944 (0.05%) aligned concordantly >1 times
    ----
    15155323 pairs aligned concordantly 0 times; of these:
      95593 (0.63%) aligned discordantly 1 time
    ----
    15059730 pairs aligned 0 times concordantly or discordantly; of these:
      30119460 mates make up the pairs; of these:
        29853788 (99.12%) aligned 0 times
        195884 (0.65%) aligned exactly 1 time
        69788 (0.23%) aligned >1 times
2.55% overall alignment rate

Should I get 15155323 PE at the end ?

ADD REPLY
0
Entering edit mode

would be the same using samtools view -f 12 input.bam? With flag "-f 12" you collapse 4 and 8, right?

ADD REPLY
0
Entering edit mode
7.8 years ago
Sej Modha 5.3k

You can use bam2fastq utility to extract mapped or unmapped reads

ADD COMMENT
0
Entering edit mode

bam2fastq is no longer supported - they recommend using Picard's Sam2fastq (which can handle bam as well): https://gsl.hudsonalpha.org/information/software/bam2fastq

ADD REPLY

Login before adding your answer.

Traffic: 1974 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6