Question: Filter sam/bam files
0
gravatar for ramesh.8v
7 months ago by
ramesh.8v200
United States
ramesh.8v200 wrote:

Hi:

I mapped single-end sequenced GBS reads onto two similar genomes. I have two sam files one for each genome. I'm interested to filter sam/bam files based on these conditions:
1. reads unique to a genome/reads mapped only on one genome
2. reads are not unique/mapped on both genomes.

Are there any existing implementations for this? I want to know before I go ahead and implement this in pysam.

gbs sam samtools bam filter • 324 views
ADD COMMENTlink modified 7 months ago by swbarnes24.8k • written 7 months ago by ramesh.8v200
1

Compare BAM files ?

ADD REPLYlink written 7 months ago by Eric Lim1.2k

Were the reads mapped to each genome separately? I don't think there is a tool that does what you want, but BBSplit from BBTools comes close - you would have to do the mapping again

If you end up coding your own solution, I think it would be wise to sort the sam / bam by name, so parsing both files simultaneously is simpler and uses less memory.

ADD REPLYlink written 7 months ago by h.mon23k

Thanks for the suggestion! Yes, they were mapped separately. I will look into BBTools.

ADD REPLYlink written 7 months ago by ramesh.8v200
3
gravatar for Pierre Lindenbaum
7 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum116k wrote:

your reads are single end ?

samtools view -F 4 in1.bam | cut -f 1 | sort | uniq > r1.txt 
samtools view -F 4 in2.bam | cut -f 1 | sort | uniq > r2.txt

reads mapped only on one genome

comm -3 r1.txt r2.txt

reads are not unique/mapped on both genomes.

comm -12 r1.txt r2.txt
ADD COMMENTlink written 7 months ago by Pierre Lindenbaum116k
1

Thanks! It worked :)

ADD REPLYlink modified 7 months ago • written 7 months ago by ramesh.8v200
1
gravatar for swbarnes2
7 months ago by
swbarnes24.8k
United States
swbarnes24.8k wrote:

I have two sam files one for each genome.

This is going to be over-stringent. Some reads will map perfectly to one genome, and map with discrepancies to the other. You probably want to count those for the first genome, not throw them away becasue the aligner forced them to align to the second genome.

Better to combine the genomes, align to that, and then you can fish out reads that aligned uniquely.

ADD COMMENTlink written 7 months ago by swbarnes24.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1606 users visited in the last hour