Question: sorting for bedtools genomecov doesn't work properly
0
gravatar for Assa Yeroslaviz
10 months ago by
Assa Yeroslaviz1.4k
Munich
Assa Yeroslaviz1.4k wrote:

I am trying to covert my bam file into bedgraph using bedtools. For that I sorted the bam file by names, and then run the follwoing commands:

bedtools bamtobed -bedpe -i IgG_H3K4m3.sortedByName.bam > IgG_H3K4m3.bed
awk '$1==$4 && $6-$2 < 1000 {print $0}' IgG_H3K4m3.bed > IgG_H3K4m3.clean.bed
cut -f 1,2,6 IgG_H3K4m3.clean.bed | sort -k1,1 -k2,2n -k3,3n > IgG_H3K4m3.frag.bed
bedtools genomecov -bg -i IgG_H3K4m3.frag.bed -g Mmu.GrCm38.chromSize > IgG_H3K4m3.bedgraph

But with the last command I get the following error

Input error: Chromosome 12 found in non-sequential lines. This suggests that the input file is not sorted correctly.

When I look what the command before last is doing ( the sorting steps) I can see a strange behavior of sort :

cut -f 1,2,6 P193/Mmu.GrCm38/bowtie2/IgG_H3K4m3.clean.bed | sort -k1,1 -k2,2n -k3,3n| cut -f 1 | uniq
10
1
11
12
1
14
15
16
1
18
2
3
5
6
9
MT
X

Somehow it can't sort chromosome 1 correctly. Is there any wat to explain this ( or even better to correct it).

thanks

ADD COMMENTlink modified 10 months ago by Kevin Blighe69k • written 10 months ago by Assa Yeroslaviz1.4k

Output of sort --version?

ADD REPLYlink written 10 months ago by ATpoint44k

sort (GNU coreutils) 8.22

ADD REPLYlink written 10 months ago by Assa Yeroslaviz1.4k
2
gravatar for Kevin Blighe
10 months ago by
Kevin Blighe69k
Republic of Ireland
Kevin Blighe69k wrote:

Hey Assa.

You could try this, and it should work just fine:

sort -V -k1,1 -k2,2n -k3,3n

Example:

cat test.bed 
1   12  32
2   12  76
12  12  765
12  12  54
10  453 987
11  1   34
11  34  88
MT  34  100
MT  22  2222
X   12222   5555665
12  3324    77566
Y   676 89898989

sort -k1,1 -k2,2n -k3,3n test.bed 
1   12  32
10  453 987
11  1   34
11  34  88
12  12  54
12  12  765
12  3324    77566
2   12  76
MT  22  2222
MT  34  100
X   12222   5555665
Y   676 89898989

sort -V -k1,1 -k2,2n -k3,3n test.bed 
1   12  32
2   12  76
10  453 987
11  1   34
11  34  88
12  12  54
12  12  765
12  3324    77566
MT  22  2222
MT  34  100
X   12222   5555665
Y   676 89898989

Kevin

ADD COMMENTlink written 10 months ago by Kevin Blighe69k

Thanks this seems to work, but is there an explanation as to why the other one didn't work int his case?

ADD REPLYlink written 10 months ago by Assa Yeroslaviz1.4k

I cannot confirm but it's something to do with the version of sort that you have.

ADD REPLYlink written 10 months ago by Kevin Blighe69k
1

It is odd, I have the same version as Assa Yeroslaviz (on Mac) and cannot reproduce this behaviour.

ADD REPLYlink written 10 months ago by ATpoint44k
1

I'm working on an ubuntu server with version 16.0.4. Maybe there is a difference. But with the -V option it does what it suppose to do. thanks again

ADD REPLYlink written 10 months ago by Assa Yeroslaviz1.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1485 users visited in the last hour
_