Question: Bedtools intersect error
1
gravatar for jaqx008
16 months ago by
jaqx00850
jaqx00850 wrote:

Hello All, I am trying to perform intersection between with my bed file and vcf and I keep getting errors.my bed files is a call of all the genes in the ORFs. The following is the command I am using and the errors.

bedtools intersect -a a.vcf -b b.bed > int.txt

Error: Invalid record in file b.bed. Record is scaffold1090size18908_225 18674 18522

This scaffold looks good to me and in the same format as the other scaffolds. what could be the problem? and please offer suggestions. Thanks

ADD COMMENTlink modified 16 months ago by Alex Reynolds28k • written 16 months ago by jaqx00850
0
gravatar for Alex Reynolds
16 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

The BED format requires that the start position be less than the stop position. You could fix bad BED coordinates with awk, and re-sort with BEDOPS sort-bed to make sure the output is sorted correctly:

$ awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' b.bed | sort-bed - > b.fixed.bed
ADD COMMENTlink modified 16 months ago • written 16 months ago by Alex Reynolds28k

Thanks alex. I am a beginner, and struggling to understand what you mean. could you throw a bit more light please? or should i just run the awk command you posted?

ADD REPLYlink written 16 months ago by jaqx00850

BED format is

identifier start stop

The number in second column can't be larger than third column. If you run @Alex's code it should fix those records that fail that criteria. So run it as shown.

ADD REPLYlink modified 16 months ago • written 16 months ago by genomax70k

Look at the position fields in the error message in your question. The first position field is the start position (18674). The second position field is the stop position (18522). The BED format requires that the position fields are ordered, so that the start position is strictly less than the stop position. But 18674 > 18522, so that's not going to work.

The awk command walks through every line of your input BED file and will switch the start and stop positions, if they are ordered incorrectly, or increment the stop position, if the start and stop positions are equal. The adjusted result is piped to sort-bed so that you get fixed output that is sorted and ready to use for set operations with BEDOPS or other toolkits.

ADD REPLYlink modified 16 months ago • written 16 months ago by Alex Reynolds28k

Oh Oh Oh. I understand what you are saying now. I will run it now.

ADD REPLYlink written 16 months ago by jaqx00850

$ awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' /Volumes/500GB/pavelAssemblyBowtieIndex/genoOrfs.bed | sort-bed - > b.fixed.bed -bash: $: command not found -bash: sort-bed: command not found

I received this error.

ADD REPLYlink written 16 months ago by jaqx00850

Install BEDOPS, which contains sort-bed: http://bedops.readthedocs.io/en/latest/content/installation.html#

ADD REPLYlink written 16 months ago by Alex Reynolds28k

I used brew to install BEDOPS. still gave

-bash: $: command not found

ADD REPLYlink written 16 months ago by jaqx00850

What does your PATH contain? Run echo $PATH.

I think brew puts items in /usr/local/bin, so I would make sure that your PATH contains that directory, at least.

Did you get errors or warning messages when running brew?

Once you have sorted out installation problems, you should be able to run sort-bed --version to verify that it is installed and to get back the installed version.

ADD REPLYlink modified 16 months ago • written 16 months ago by Alex Reynolds28k

Thank you Alex. I was able to do it but biostar forum won't let me post on the thread because I can only post 5 times every 6h. However, to make the awk command work, we had to add a cat in function and then pipe it into awk.

cat a.bed | awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' | sort-bed > b.fixed.bed Thanks again.

ADD REPLYlink written 16 months ago by jaqx00850

So I am trying to use the corrected bed file for intersection with my vcf file. The command appears to execute but out-put a zero byte file each time. Does this mean the files do not match?or it didn't output properly? Could you help please?

ADD REPLYlink written 16 months ago by jaqx00850

Given inputs:

$ vcf2bed < a.vcf > a.bed
$ awk '{...}' b.bed | sort-bed - > b.fixed.bed

Try bedops --intersect:

$ bedops --intersect a.bed b.fixed.bed > answer.bed

If answer.bed is a zero-byte file, then it is likely that a.bed and b.fixed.bed do not share common genomic space.

ADD REPLYlink modified 16 months ago • written 16 months ago by Alex Reynolds28k

Ok. I still get zero. maybe they do not share a common genomic space like you mentioned. Thanks lot

ADD REPLYlink written 16 months ago by jaqx00850

If you want to do a "sanity check" on the inputs, use --ec, i.e.:

$ bedops --ec --intersect a.bed b.fixed.bed > answer.bed

This will run slower, but identify potential formatting problems with a.bed or b.fixed.bed.

It's unlikely there are problems with b.fixed.bed if you used sort-bed as described above, as sort-bed will report formatting problems, as well. But you could catch possible issues with a.bed using --ec.

If you do not get any errors, you can (and should) remove --ec to regain performance enhancements in bedops for typical usage.

ADD REPLYlink modified 16 months ago • written 16 months ago by Alex Reynolds28k

Are you actually typing the $ in your shell? Because you shouldn't, that's just used to tell you this is a shell command. Just do the following:

awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' /Volumes/500GB/pavelAssemblyBowtieIndex/genoOrfs.bed | sort-bed - > b.fixed.bed -bash
ADD REPLYlink written 16 months ago by WouterDeCoster40k

No I wasn't using the $ sign. However, I was able to resolve the issue by first cat in the file then awk as i explained above. Thanks for your comment

ADD REPLYlink written 16 months ago by jaqx00850
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1722 users visited in the last hour