Question: intersection, Bed file and VCF.
0
gravatar for jaqx008
12 months ago by
jaqx00840
jaqx00840 wrote:

Hello All, I am trying to perform intersection between with my bed file and vcf and I keep getting errors.my bed files is a call of all the genes in the ORFs. The following is the command I am using and the errors.

bedtools intersect -a a.vcf -b b.bed -wa

error

  • Please ensure that your file is TAB delimited (e.g., cat -t FILE).
  • Also ensure that your file has integer chromosome coordinates in the expected columns (e.g., cols 2 and 3 for BED).

I tried to do cat -t a.vcf to TAB it but still get the errors. I will appreciate your suggestions. Thanks in advance.

ADD COMMENTlink modified 12 months ago • written 12 months ago by jaqx00840
0
gravatar for Alex Reynolds
12 months ago by
Alex Reynolds27k
Seattle, WA USA
Alex Reynolds27k wrote:

You might try BEDOPS bedops --ec:

$ bedops --ec --element-of 1 <(vcf2bed < a.vcf) <(sort-bed b.bed) > answer.bed

The use of --ec with bedops can return useful debugging information about malformed BED data, at the cost of a slower runtime. The --ec operator can be removed once the input is debugged.

ADD COMMENTlink written 12 months ago by Alex Reynolds27k

Ok let me try this. Thanks

ADD REPLYlink written 12 months ago by jaqx00840

bedops --ec < a.vcf < b.bed > answer.bed

Error -bash: bedops: command not found

ADD REPLYlink written 12 months ago by jaqx00840

You'll need to install it, probably: http://bedops.readthedocs.io/en/latest/content/installation.html

ADD REPLYlink written 12 months ago by Alex Reynolds27k
0
gravatar for dariober
12 months ago by
dariober9.9k
WCIP | Glasgow | UK
dariober9.9k wrote:

It seems the problem is not related to bedtools but rather with your input files.

cat -t a.vcf doesn't fix the problem, rather it tells you whether your columns are tab delimited. With cat -t a.vcf (or cat -t b.bed) you should see on screen columns separated by ^I. Perhaps your columns are separated by spaces instead? If so, you could fix it with tr ' ' '\t' < a.vcf > A.vcf

Also, as the error message suggests, have a look if you coordinates are actually integers. Try to do sort -k2,2n a.vcf to see if the top rows (excluding the header) have integers in column 2.

ADD COMMENTlink written 12 months ago by dariober9.9k

realized the error was in my bed file. I made sure all tabs are corrected and ran it again. but got complain about the one scaffold.

bedtools intersect -a a.vcf -b b.bed > file.txt Error: Invalid record in file genoOrfs.bed. Record is scaffold1090size18908_225 18674 18522

It seem fine like the rest to me. Don't know why the error

ADD REPLYlink written 12 months ago by jaqx00840

...Just in case you left out some space-to-TAB conversion, try to do

grep ' ' genoOrfs.bed # This shouldn't print any line.
grep 'scaffold1090size18908_225' genoOrfs.bed | cat -vet # Print the faulty record showing TABs
ADD REPLYlink modified 12 months ago • written 12 months ago by dariober9.9k

I tried it, the first command gave an output with Zero byte second gave

scaffold1090size18908_225^I18674^I18522$

Not sure what they mean..

ADD REPLYlink written 12 months ago by jaqx00840
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1737 users visited in the last hour