Question: bedtools_intersect files sorting problem
0
gravatar for victoria_aleks
5 months ago by
victoria_aleks20 wrote:

Dear all, I am running the command: bedtools intersect -sorted -g [GENOME_FILE] -abam [BAM_FILE] -b [BED_FILE] > [OUTPUT_FILE] - and am getting an error about wrong sorting. Apparently my input BAM files have strange sorting, that goes: chr1 chr2 ... Chr9 X Y Chr11 chr12 ...

While BED and GENOME files are: chr1 chr2 ... chr22 X Y

As I understand it is easier to "re-sort" BED file than BAM. Still, I have troubles with doing this :) Could any one advice on this subject? Thank you!

bedtools_intersect • 211 views
ADD COMMENTlink modified 5 months ago • written 5 months ago by victoria_aleks20

How did you sort the bam file?

ADD REPLYlink written 5 months ago by Iñigo Prada350

I did not. I have sorted bed and genome files according to bam (as I thought). only later I found out that bam file has this strange sorting with XY chromosomes, and, honestly, i dont know how to sort bam file :)

ADD REPLYlink modified 5 months ago • written 5 months ago by victoria_aleks20
samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-T tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]

This is what you're looking for if you don't know how to sort a bam file: http://www.htslib.org/doc/samtools.html

ADD REPLYlink written 5 months ago by Macspider1.5k

I still dont understand how i can reshape the order of the chromosomes, so that X and Y will be not between chr9 and chr10, but at the beginning/or at the end...

ADD REPLYlink written 5 months ago by victoria_aleks20

Depending in your error, you may need to sort the bam file. You will need to add more info to the question in case you want some help

ADD REPLYlink written 5 months ago by Iñigo Prada350

the error is that genome file and bed file have sorting "1 2 3 4 5 6 7 8 9 10 11... x y" and bam file has " 1 2 3 4 5 6 7 8 9 x y 11 12..."

ADD REPLYlink written 5 months ago by victoria_aleks20

You need to sort your bed_file and your genome_file appropriately. Bedtools has a 'sort' function that's rather slow, but you can do this to fix the problem:

LC_COLLATE=C sort -k 1,1 -k2,2n a.bed
ADD REPLYlink written 5 months ago by Sinji2.5k

I actually sorted the bed file in this manner, but it puts the XY at the end, while my bam file has XY chromosomes in between single-number chromosomes and double-number chromosomes (i.e. between chr9 and chr10)

ADD REPLYlink modified 5 months ago • written 5 months ago by victoria_aleks20

So you did LC_COLLATE?

ADD REPLYlink written 5 months ago by Sinji2.5k

yes, and XY chromosomes a placed at the end :)

ADD REPLYlink written 5 months ago by victoria_aleks20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1468 users visited in the last hour