Question: bedtools slop set start to -1
0
gravatar for Medhat
26 days ago by
Medhat8.4k
Texas
Medhat8.4k wrote:

Hi!

I am using bedtools slop for padding bed file with 10k.
bedtool version bedtools v2.28.0

Command:

bedtools slop -header -i gene_annotations.bed -g genome_mainchrs.genome -b 10000 > gene_annotations_padding_10000.bed

Input ex:

1   11868   12227   ENSG00000223972  .....
1   11868   14409   ENSG00000223972  .....
1   11868   14409   ENSG00000223972  ....
1   12009   12057   ENSG00000223972  ......

Output:

1   -1  1868    ENSG00000223972 
1   -1  1868    ENSG00000223972
1   -1  1868    ENSG00000223972 
1   -1  2009    ENSG00000223972

According to bedtools slop

Starts will be set to 0 if options would force it below 0.

why there is -1?

and also should not the end increase with the value of 10k ?

Update: The issue was in the genome_mainchrs.genome it contains chr[number] ex chr5 while the bed file contains just [number] 5

bedtools slop bed gtf • 137 views
ADD COMMENTlink modified 26 days ago by Alex Reynolds28k • written 26 days ago by Medhat8.4k

Could be a version mismatch?

ADD REPLYlink written 26 days ago by RamRS22k

Version mismatch of the tool and documentation or what?

ADD REPLYlink modified 26 days ago • written 26 days ago by Medhat8.4k

Sorry, yes - maybe the version you're using doesn't match up to the version the doc refers to.

ADD REPLYlink written 26 days ago by RamRS22k

No, the issue was chromosome naming.

ADD REPLYlink written 26 days ago by Medhat8.4k
2
gravatar for Alex Reynolds
26 days ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

Here's an alternative method:

$ bedops --range 10000 --everything genes.bed > genes.pad10k.bed

This assumes genes.bed is sorted per sort-bed. If not:

$ bedops --range 10000 --everything <(sort-bed genes.bed) > genes.pad10k.bed

No genome bounds are specified, but that can be fixed by adding a fetchChromSizes command to the one-liner:

$ bedops --range 10000 --everything genes.bed | bedops --element-of 100% - <(fetchChromSizes hg38 | sort-bed - | awk -vOFS="\t" "{ print $1, 0, $2 }") > genes.pad10k.boundedByHg38.bed

This removes any padded intervals that fall outside the bounds of hg38. This can be modified to any genome bounds.

ADD COMMENTlink written 26 days ago by Alex Reynolds28k

Thanks, and I mentioned also why there was an error in my case (chromosome naming), the method you mentioned here better when you do not have .genome file.

ADD REPLYlink written 26 days ago by Medhat8.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2017 users visited in the last hour