Question: Extracting unmapped paired end from an alignment
1
gravatar for Picasa
2.7 years ago by
Picasa390
Picasa390 wrote:

Hi,

I have a list of contaminants that I want to filter out from my paired end:

bowtie2 -x contaminants -1 pair1.fastq -2 pair2.fastq -S out.sam

Now I want to extract only unmapped reads from the .sam file but I want to keep only paired end.

So if read1 map somewhere and read2 not, I want to discard these PE.

1) How can I do that ?

2) Is it possible to produce 2 sams file at the end (one for each PE) ?

Thanks

sam bowtie2 • 2.2k views
ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Picasa390

Anybody with other solutions ?

ADD REPLYlink written 2.7 years ago by Picasa390
1
gravatar for Pierre Lindenbaum
2.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:
samtools view -f 4 -bu input.bam | samtools view -f 8 -bu - | java -jar  picard-tools/picard.jar  SamToFastq  I=/dev/stdin F=out1.fq.gz F2=out2.fq.gz  FU=unpaired.fq.gz VALIDATION_STRINGENCY=LENIENT
ADD COMMENTlink written 2.7 years ago by Pierre Lindenbaum118k

Thanks, can you explain this:

-bu - ( the "-" after bu)

and

VALIDATION_STRINGENCY=LENIENT

By the way, is it the same thing for mate pair ? (PE have FR orientation while MP have RF)

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Picasa390
2
  • - - stdin from previous command's output

From samtools view options

  • b - output BAM file
  • u - output uncompressed BAM file

From picard manual, here

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default value: STRICT. This option can be set to 'null' to clear the default value. Possible values: {STRICT, LENIENT, SILENT}

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by venu6.1k

So it doesn't work in my case:

My output from bowtie2:

15301480 reads; of these: 15301480 (100.00%) were paired; of these: 15140326 (98.95%) aligned concordantly 0 times

With the command you provide, my unpaired.fq.gz is empty and :

grep '@' out1.fastq | wc -l 14798490

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Picasa390
1

enter image description here :-)

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Pierre Lindenbaum118k

Actually I have a lot of flags 77 and 141. Using -f 4 and -f 8 would be the same ?

ADD REPLYlink written 2.7 years ago by Picasa390

in both 77 and 141 "read unmapped and mate unmapped", only 1st and 2nd in pairs changes. Of course, the flags 'x'-in-pair are not related to the fact that the reads are mapped or not.

ADD REPLYlink written 2.7 years ago by Pierre Lindenbaum118k

I don't understand the discordant results between bowtie2 and these flag filtering:

5317475 reads; of these:
  15317475 (100.00%) were paired; of these:
    15155323 (98.94%) aligned concordantly 0 times
    155208 (1.01%) aligned concordantly exactly 1 time
    6944 (0.05%) aligned concordantly >1 times
    ----
    15155323 pairs aligned concordantly 0 times; of these:
      95593 (0.63%) aligned discordantly 1 time
    ----
    15059730 pairs aligned 0 times concordantly or discordantly; of these:
      30119460 mates make up the pairs; of these:
        29853788 (99.12%) aligned 0 times
        195884 (0.65%) aligned exactly 1 time
        69788 (0.23%) aligned >1 times
2.55% overall alignment rate

Should I get 15155323 PE at the end ?

ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by Picasa390

would be the same using samtools view -f 12 input.bam? With flag "-f 12" you collapse 4 and 8, right?

ADD REPLYlink modified 20 months ago • written 20 months ago by NielQC10
0
gravatar for Sej Modha
2.7 years ago by
Sej Modha4.1k
Glasgow, UK
Sej Modha4.1k wrote:

You can use bam2fastq utility to extract mapped or unmapped reads

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by Sej Modha4.1k

bam2fastq is no longer supported - they recommend using Picard's Sam2fastq (which can handle bam as well): https://gsl.hudsonalpha.org/information/software/bam2fastq

ADD REPLYlink written 2.5 years ago by Tonor420
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1186 users visited in the last hour