Question: intersection, Bed file and VCF.
0
gravatar for jaqx008
17 months ago by
jaqx00850
jaqx00850 wrote:

Hello All, I am trying to perform intersection between with my bed file and vcf and I keep getting errors.my bed files is a call of all the genes in the ORFs. The following is the command I am using and the errors.

bedtools intersect -a a.vcf -b b.bed -wa

error

  • Please ensure that your file is TAB delimited (e.g., cat -t FILE).
  • Also ensure that your file has integer chromosome coordinates in the expected columns (e.g., cols 2 and 3 for BED).

I tried to do cat -t a.vcf to TAB it but still get the errors. I will appreciate your suggestions. Thanks in advance.

ADD COMMENTlink modified 17 months ago • written 17 months ago by jaqx00850
0
gravatar for Alex Reynolds
17 months ago by
Alex Reynolds28k
Seattle, WA USA
Alex Reynolds28k wrote:

You might try BEDOPS bedops --ec:

$ bedops --ec --element-of 1 <(vcf2bed < a.vcf) <(sort-bed b.bed) > answer.bed

The use of --ec with bedops can return useful debugging information about malformed BED data, at the cost of a slower runtime. The --ec operator can be removed once the input is debugged.

ADD COMMENTlink written 17 months ago by Alex Reynolds28k

Ok let me try this. Thanks

ADD REPLYlink written 17 months ago by jaqx00850

bedops --ec < a.vcf < b.bed > answer.bed

Error -bash: bedops: command not found

ADD REPLYlink written 17 months ago by jaqx00850

You'll need to install it, probably: http://bedops.readthedocs.io/en/latest/content/installation.html

ADD REPLYlink written 17 months ago by Alex Reynolds28k
0
gravatar for dariober
17 months ago by
dariober10k
WCIP | Glasgow | UK
dariober10k wrote:

It seems the problem is not related to bedtools but rather with your input files.

cat -t a.vcf doesn't fix the problem, rather it tells you whether your columns are tab delimited. With cat -t a.vcf (or cat -t b.bed) you should see on screen columns separated by ^I. Perhaps your columns are separated by spaces instead? If so, you could fix it with tr ' ' '\t' < a.vcf > A.vcf

Also, as the error message suggests, have a look if you coordinates are actually integers. Try to do sort -k2,2n a.vcf to see if the top rows (excluding the header) have integers in column 2.

ADD COMMENTlink written 17 months ago by dariober10k

realized the error was in my bed file. I made sure all tabs are corrected and ran it again. but got complain about the one scaffold.

bedtools intersect -a a.vcf -b b.bed > file.txt Error: Invalid record in file genoOrfs.bed. Record is scaffold1090size18908_225 18674 18522

It seem fine like the rest to me. Don't know why the error

ADD REPLYlink written 17 months ago by jaqx00850

...Just in case you left out some space-to-TAB conversion, try to do

grep ' ' genoOrfs.bed # This shouldn't print any line.
grep 'scaffold1090size18908_225' genoOrfs.bed | cat -vet # Print the faulty record showing TABs
ADD REPLYlink modified 17 months ago • written 17 months ago by dariober10k

I tried it, the first command gave an output with Zero byte second gave

scaffold1090size18908_225^I18674^I18522$

Not sure what they mean..

ADD REPLYlink written 17 months ago by jaqx00850
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 795 users visited in the last hour