Question: bedtools_intersect files sorting problem
0
gravatar for victoria_aleks
5 days ago by
victoria_aleks0 wrote:

Dear all, I am running the command: bedtools intersect -sorted -g [GENOME_FILE] -abam [BAM_FILE] -b [BED_FILE] > [OUTPUT_FILE] - and am getting an error about wrong sorting. Apparently my input BAM files have strange sorting, that goes: chr1 chr2 ... Chr9 X Y Chr11 chr12 ...

While BED and GENOME files are: chr1 chr2 ... chr22 X Y

As I understand it is easier to "re-sort" BED file than BAM. Still, I have troubles with doing this :) Could any one advice on this subject? Thank you!

bedtools_intersect • 111 views
ADD COMMENTlink modified 4 days ago • written 5 days ago by victoria_aleks0

How did you sort the bam file?

ADD REPLYlink written 5 days ago by Iñigo Prada150

I did not. I have sorted bed and genome files according to bam (as I thought). only later I found out that bam file has this strange sorting with XY chromosomes, and, honestly, i dont know how to sort bam file :)

ADD REPLYlink modified 4 days ago • written 4 days ago by victoria_aleks0
samtools sort [-l level] [-m maxMem] [-o out.bam] [-O format] [-n] [-T tmpprefix] [-@ threads] [in.sam|in.bam|in.cram]

This is what you're looking for if you don't know how to sort a bam file: http://www.htslib.org/doc/samtools.html

ADD REPLYlink written 4 days ago by Macspider930

I still dont understand how i can reshape the order of the chromosomes, so that X and Y will be not between chr9 and chr10, but at the beginning/or at the end...

ADD REPLYlink written 4 days ago by victoria_aleks0

Depending in your error, you may need to sort the bam file. You will need to add more info to the question in case you want some help

ADD REPLYlink written 4 days ago by Iñigo Prada150

the error is that genome file and bed file have sorting "1 2 3 4 5 6 7 8 9 10 11... x y" and bam file has " 1 2 3 4 5 6 7 8 9 x y 11 12..."

ADD REPLYlink written 4 days ago by victoria_aleks0

You need to sort your bed_file and your genome_file appropriately. Bedtools has a 'sort' function that's rather slow, but you can do this to fix the problem:

LC_COLLATE=C sort -k 1,1 -k2,2n a.bed
ADD REPLYlink written 4 days ago by Sinji2.1k

I actually sorted the bed file in this manner, but it puts the XY at the end, while my bam file has XY chromosomes in between single-number chromosomes and double-number chromosomes (i.e. between chr9 and chr10)

ADD REPLYlink modified 4 days ago • written 4 days ago by victoria_aleks0

So you did LC_COLLATE?

ADD REPLYlink written 4 days ago by Sinji2.1k

yes, and XY chromosomes a placed at the end :)

ADD REPLYlink written 4 days ago by victoria_aleks0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1441 users visited in the last hour