Question: Extracting this data frame from a .vcf file
1
gravatar for F
8 days ago by
F3.3k
Iran
F3.3k wrote:

Hi,

I have one .vcf file of whole genome sequencing of tumour Vs normal samples of 21 patients.

I need a data from like this as input for a tool for finding driver genes

> head(mutations)
  sampleID chr      pos ref mut
1 Sample_1   1   871244   G   C
2 Sample_1   1  6648841   C   G
3 Sample_1   1 17557072   G   A
4 Sample_1   1 22838492   G   C
5 Sample_1   1 27097733   G   A
6 Sample_1   1 27333206   G   A

In separated .vcf files for each patient I have start, end, chromosome, ref, and variant allele. However I am sure how to get such data frame from this big vcf

Any help please?

Thank you

R wgs vcf • 206 views
ADD COMMENTlink modified 8 days ago • written 8 days ago by F3.3k
2

This is a basic question, please invest some time to read through bcftools manuals. Or if you choose to stay in R, then read about vcfR package.

ADD REPLYlink modified 8 days ago • written 8 days ago by zx87546.5k

Thank you I also tried vcfR

> read.vcfR("trg.snp.pass.vcf")
Error in read.vcfR("trg.snp.pass.vcf") : 
  File: trg.snp.pass.vcf does not appear to be a VCF file.
  First line of file:
 trg.snp.pass.vcf 
  Should begin with:
##fileformat=VCFv 
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE,  :
  embedded nul(s) found in input
> read.vcfR("trg.snp.pass.vcf.tar")
Error in read.vcfR("trg.snp.pass.vcf.tar") : 
  File: trg.snp.pass.vcf.tar does not appear to be a VCF file.
  First line of file:
 trg.snp.pass.vcf.tar 
  Should begin with:
##fileformat=VCFv 
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE,  :
  embedded nul(s) found in input
>
ADD REPLYlink written 8 days ago by F3.3k

bcftools query plugin and snpsift plugin in galaxy also do that

ADD REPLYlink modified 7 days ago • written 8 days ago by F3.3k
3
gravatar for zx8754
8 days ago by
zx87546.5k
London
zx87546.5k wrote:

Using bcftools:

bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' myFile.vcf > myFileLong.txt
ADD COMMENTlink written 8 days ago by zx87546.5k

Thank you,

says

[fi1d18@cyan01 ~]$ [fi1d18@cyan01 ~]$ bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' trg.snp.pass.vcf > myFileLong.txt
-bash: [fi1d18@cyan01: command not found
[fi1d18@cyan01 ~]$ Failed to open trg.snp.pass.vcf: unknown file type

And when I tried for .vcf for one sample says

[fi1d18@cyan01 ~]$ [fi1d18@cyan01 ~]$ bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' LP2000104-DNA_A01_vs_LP2000101-DNA_A01.passed.somatic.indel.vcf > myFileLong.txt
bash: [fi1d18@cyan01: command not found
[fi1d18@cyan01 ~]$ Error: no such tag defined in the VCF header: FORMAT/GT
ADD REPLYlink modified 8 days ago • written 8 days ago by F3.3k
1

bash: [fi1d18@cyan01: command not found

Your command line doesn't start with bcftools. The first thing that is trying to start is [fi1d18@cyan01. Make sure there are no more symbols before the command you like to start.

fin swimmer

ADD REPLYlink written 8 days ago by finswimmer9.9k

Either provide full path for bcftools or add the directory with that executable to your $PATH. export PATH=$PATH:/dir_for_bcftools

ADD REPLYlink modified 8 days ago • written 8 days ago by genomax62k

Sorry, I am in path but either galaxy or in Linux I am getting this error

Error: no such tag defined in the VCF header: FORMAT/GT

and galaxy says

Fatal error: Exit code 255 ()
Error: no such tag defined in the VCF header: INFO/REFt. FORMAT fields must be in square brackets, e.g. "[ REFt]"

The head of my vcf is this

##bcftools_viewCommand=view -h c.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

I don't know what is going wrong in my vcf files though

ADD REPLYlink written 8 days ago by F3.3k

Are you sure that this is the complete header?

bcftools is very strict about the vcf specs. So the first line must be:

##fileformat=VCFv4.1

(Version number can differ)

For each contig you need an entry like this:

##contig=<ID=chr1,length=248956422>

For each key in the INFO and FORMAT column you need in entry in the header. For GT this looks like this:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

So, are there more entry in the header?

fin swimmer

ADD REPLYlink written 8 days ago by finswimmer9.9k
2
gravatar for F
7 days ago by
F3.3k
Iran
F3.3k wrote:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' c.vcf

This solve the error

ADD COMMENTlink written 7 days ago by F3.3k
1

Great it worked out, accept it if this was the solution.

ADD REPLYlink written 7 days ago by zx87546.5k

%END would return the start and end

ADD REPLYlink modified 6 days ago • written 6 days ago by F3.3k
1
gravatar for andrew.j.skelton73
8 days ago by
London
andrew.j.skelton735.5k wrote:

GATK has a tool for that, see VariantsToTable

ADD COMMENTlink written 8 days ago by andrew.j.skelton735.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1959 users visited in the last hour