Question: ERROR: illegal character '.' when running bedtools closest command
0
gravatar for Matus
2.9 years ago by
Matus10
Matus10 wrote:

Hello everyone,

I have experienced a problem when I was trying to find closest TSS to a peak by using this command:

bedtools closest -a file_peaks.narrowPeak -b path/genes.tss.bed  > file_closestTSS.txt

The error says: * ERROR: illegal character '.' found in integer conversion of string "3216969.". Exiting... I generated genes.tss.bed file from genes.gfp file which i found in Annotations of iGenome mm10

awk 'BEGIN {FS=OFS="\t"} { if($7=="+"){tss=$4-1} else { tss=$5} print $1,tss, tss+1 ".", ".", $7, $9}' path/genes.gtf > path/genes.tss.bed

Could anyone help me please? Thank you

chip-seq gene • 2.3k views
ADD COMMENTlink modified 2.7 years ago by Biostar ♦♦ 20 • written 2.9 years ago by Matus10
1

It seems that a dot '.' is on the wrong place (where an integer is expected).

Show how your bed file looks like, maybe it becomes clear where that might be.

ADD REPLYlink written 2.9 years ago by Benn7.9k

I'm not sure, but dont you need comma aftertss+1`?

ADD REPLYlink written 2.9 years ago by PoGibas4.8k

what is the output of head path/genes.tss.bed ?

ADD REPLYlink written 2.9 years ago by venu6.4k
head /home/s1469622/dstore/Reference_genomes/Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2015-07-17-14-33-26/Genes/genes.tss.bed
chr1    3216968 3216969.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3216024 3216025.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3216968 3216969.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3421901 3421902.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3421901 3421902.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3671348 3671349.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3671498 3671499.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    3671348 3671349.        .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
chr1    4293012 4293013.        .       -       gene_id "Rp1"; gene_name "Rp1"; p_id "P17361"; transcript_id "NM_001195662"; tss_id "TSS6138";
chr1    4292983 4292984.        .       -       gene_id "Rp1"; gene_name "Rp1"; p_id "P17361"; transcript_id "NM_001195662"; tss_id "TSS6138";
ADD REPLYlink written 2.9 years ago by Matus10
1

remove the . attached with end coordinates.

Try following

awk 'BEGIN {FS=OFS="\t"} { if($7=="+"){tss=$4-1} else { tss=$5} print $1,tss, tss+1, ".", $7, $9}' path/genes.gtf > path/genes.tss.bed

ADD REPLYlink written 2.9 years ago by venu6.4k

I get this error when I use bedtools afterwards:

Error: Sorted input specified, but the file /home/s1469622/dstore/Reference_genomes/Mus_musculus/UCSC/mm10/Annotation/Archives/archive-2015-07-17-14-33-26/Genes/genes.tss.bed has the following out of order record
chr1    3216024 3216025 .       -       gene_id "Xkr4"; gene_name "Xkr4"; p_id "P15391"; transcript_id "NM_001011874"; tss_id "TSS27105";
ADD REPLYlink written 2.9 years ago by Matus10
3

You need to run bedtools sort on this.

ADD REPLYlink written 2.9 years ago by PoGibas4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1915 users visited in the last hour