Bedtools intersect error
1
1
Entering edit mode
6.1 years ago
jaqx008 ▴ 110

Hello All, I am trying to perform intersection between with my bed file and vcf and I keep getting errors.my bed files is a call of all the genes in the ORFs. The following is the command I am using and the errors.

bedtools intersect -a a.vcf -b b.bed > int.txt

Error: Invalid record in file b.bed. Record is 
scaffold1090size18908_225   18674   18522

This scaffold looks good to me and in the same format as the other scaffolds. what could be the problem? and please offer suggestions.

Thanks

bedtools vcf variant-calling GATK bed • 5.4k views
ADD COMMENT
0
Entering edit mode
6.1 years ago

The BED format requires that the start position be less than the stop position. You could fix bad BED coordinates with awk, and re-sort with BEDOPS sort-bed to make sure the output is sorted correctly:

$ awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' b.bed | sort-bed - > b.fixed.bed
ADD COMMENT
0
Entering edit mode

Thanks alex. I am a beginner, and struggling to understand what you mean. could you throw a bit more light please? or should i just run the awk command you posted?

ADD REPLY
0
Entering edit mode

BED format is

identifier start stop

The number in second column can't be larger than third column. If you run @Alex's code it should fix those records that fail that criteria. So run it as shown.

ADD REPLY
0
Entering edit mode

Look at the position fields in the error message in your question. The first position field is the start position (18674). The second position field is the stop position (18522). The BED format requires that the position fields are ordered, so that the start position is strictly less than the stop position. But 18674 > 18522, so that's not going to work.

The awk command walks through every line of your input BED file and will switch the start and stop positions, if they are ordered incorrectly, or increment the stop position, if the start and stop positions are equal. The adjusted result is piped to sort-bed so that you get fixed output that is sorted and ready to use for set operations with BEDOPS or other toolkits.

ADD REPLY
0
Entering edit mode

Oh Oh Oh. I understand what you are saying now. I will run it now.

ADD REPLY
0
Entering edit mode

$ awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' /Volumes/500GB/pavelAssemblyBowtieIndex/genoOrfs.bed | sort-bed - > b.fixed.bed -bash: $: command not found -bash: sort-bed: command not found

I received this error.

ADD REPLY
0
Entering edit mode
ADD REPLY
0
Entering edit mode

I used brew to install BEDOPS. still gave

-bash: $: command not found

ADD REPLY
0
Entering edit mode

What does your PATH contain? Run echo $PATH.

I think brew puts items in /usr/local/bin, so I would make sure that your PATH contains that directory, at least.

Did you get errors or warning messages when running brew?

Once you have sorted out installation problems, you should be able to run sort-bed --version to verify that it is installed and to get back the installed version.

ADD REPLY
0
Entering edit mode

Thank you Alex. I was able to do it but biostar forum won't let me post on the thread because I can only post 5 times every 6h. However, to make the awk command work, we had to add a cat in function and then pipe it into awk.

cat a.bed | awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' | sort-bed > b.fixed.bed Thanks again.

ADD REPLY
0
Entering edit mode

So I am trying to use the corrected bed file for intersection with my vcf file. The command appears to execute but out-put a zero byte file each time. Does this mean the files do not match?or it didn't output properly? Could you help please?

ADD REPLY
0
Entering edit mode

Given inputs:

$ vcf2bed < a.vcf > a.bed
$ awk '{...}' b.bed | sort-bed - > b.fixed.bed

Try bedops --intersect:

$ bedops --intersect a.bed b.fixed.bed > answer.bed

If answer.bed is a zero-byte file, then it is likely that a.bed and b.fixed.bed do not share common genomic space.

ADD REPLY
0
Entering edit mode

Ok. I still get zero. maybe they do not share a common genomic space like you mentioned. Thanks lot

ADD REPLY
0
Entering edit mode

If you want to do a "sanity check" on the inputs, use --ec, i.e.:

$ bedops --ec --intersect a.bed b.fixed.bed > answer.bed

This will run slower, but identify potential formatting problems with a.bed or b.fixed.bed.

It's unlikely there are problems with b.fixed.bed if you used sort-bed as described above, as sort-bed will report formatting problems, as well. But you could catch possible issues with a.bed using --ec.

If you do not get any errors, you can (and should) remove --ec to regain performance enhancements in bedops for typical usage.

ADD REPLY
0
Entering edit mode

Are you actually typing the $ in your shell? Because you shouldn't, that's just used to tell you this is a shell command. Just do the following:

awk '{ if ($2 > $3) { t = $2; $2 = $3; $3 = t; } else if ($2 == $3) { $3 += 1; } print $0; }' /Volumes/500GB/pavelAssemblyBowtieIndex/genoOrfs.bed | sort-bed - > b.fixed.bed -bash
ADD REPLY
0
Entering edit mode

No I wasn't using the $ sign. However, I was able to resolve the issue by first cat in the file then awk as i explained above. Thanks for your comment

ADD REPLY

Login before adding your answer.

Traffic: 2835 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6