Question: intersection, Bed file and VCF.
0
gravatar for jaqx008
2.7 years ago by
jaqx00870
jaqx00870 wrote:

Hello All, I am trying to perform intersection between with my bed file and vcf and I keep getting errors.my bed files is a call of all the genes in the ORFs. The following is the command I am using and the errors.

bedtools intersect -a a.vcf -b b.bed -wa

error

  • Please ensure that your file is TAB delimited (e.g., cat -t FILE).
  • Also ensure that your file has integer chromosome coordinates in the expected columns (e.g., cols 2 and 3 for BED).

I tried to do cat -t a.vcf to TAB it but still get the errors. I will appreciate your suggestions. Thanks in advance.

ADD COMMENTlink modified 2.7 years ago • written 2.7 years ago by jaqx00870
0
gravatar for Alex Reynolds
2.7 years ago by
Alex Reynolds31k
Seattle, WA USA
Alex Reynolds31k wrote:

You might try BEDOPS bedops --ec:

$ bedops --ec --element-of 1 <(vcf2bed < a.vcf) <(sort-bed b.bed) > answer.bed

The use of --ec with bedops can return useful debugging information about malformed BED data, at the cost of a slower runtime. The --ec operator can be removed once the input is debugged.

ADD COMMENTlink written 2.7 years ago by Alex Reynolds31k

Ok let me try this. Thanks

ADD REPLYlink written 2.7 years ago by jaqx00870

bedops --ec < a.vcf < b.bed > answer.bed

Error -bash: bedops: command not found

ADD REPLYlink written 2.7 years ago by jaqx00870

You'll need to install it, probably: http://bedops.readthedocs.io/en/latest/content/installation.html

ADD REPLYlink written 2.7 years ago by Alex Reynolds31k
0
gravatar for dariober
2.7 years ago by
dariober11k
WCIP | Glasgow | UK
dariober11k wrote:

It seems the problem is not related to bedtools but rather with your input files.

cat -t a.vcf doesn't fix the problem, rather it tells you whether your columns are tab delimited. With cat -t a.vcf (or cat -t b.bed) you should see on screen columns separated by ^I. Perhaps your columns are separated by spaces instead? If so, you could fix it with tr ' ' '\t' < a.vcf > A.vcf

Also, as the error message suggests, have a look if you coordinates are actually integers. Try to do sort -k2,2n a.vcf to see if the top rows (excluding the header) have integers in column 2.

ADD COMMENTlink written 2.7 years ago by dariober11k

realized the error was in my bed file. I made sure all tabs are corrected and ran it again. but got complain about the one scaffold.

bedtools intersect -a a.vcf -b b.bed > file.txt Error: Invalid record in file genoOrfs.bed. Record is scaffold1090size18908_225 18674 18522

It seem fine like the rest to me. Don't know why the error

ADD REPLYlink written 2.7 years ago by jaqx00870

...Just in case you left out some space-to-TAB conversion, try to do

grep ' ' genoOrfs.bed # This shouldn't print any line.
grep 'scaffold1090size18908_225' genoOrfs.bed | cat -vet # Print the faulty record showing TABs
ADD REPLYlink modified 2.7 years ago • written 2.7 years ago by dariober11k

I tried it, the first command gave an output with Zero byte second gave

scaffold1090size18908_225^I18674^I18522$

Not sure what they mean..

ADD REPLYlink written 2.7 years ago by jaqx00870
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 952 users visited in the last hour