Question: Extracting this data frame from a .vcf file
1
gravatar for F
6 months ago by
F3.4k
Iran
F3.4k wrote:

Hi,

I have one .vcf file of whole genome sequencing of tumour Vs normal samples of 21 patients.

I need a data from like this as input for a tool for finding driver genes

> head(mutations)
  sampleID chr      pos ref mut
1 Sample_1   1   871244   G   C
2 Sample_1   1  6648841   C   G
3 Sample_1   1 17557072   G   A
4 Sample_1   1 22838492   G   C
5 Sample_1   1 27097733   G   A
6 Sample_1   1 27333206   G   A

In separated .vcf files for each patient I have start, end, chromosome, ref, and variant allele. However I am sure how to get such data frame from this big vcf

Any help please?

Thank you

R wgs vcf • 424 views
ADD COMMENTlink modified 6 months ago • written 6 months ago by F3.4k
2

This is a basic question, please invest some time to read through bcftools manuals. Or if you choose to stay in R, then read about vcfR package.

ADD REPLYlink modified 6 months ago • written 6 months ago by zx87547.9k

Thank you I also tried vcfR

> read.vcfR("trg.snp.pass.vcf")
Error in read.vcfR("trg.snp.pass.vcf") : 
  File: trg.snp.pass.vcf does not appear to be a VCF file.
  First line of file:
 trg.snp.pass.vcf 
  Should begin with:
##fileformat=VCFv 
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE,  :
  embedded nul(s) found in input
> read.vcfR("trg.snp.pass.vcf.tar")
Error in read.vcfR("trg.snp.pass.vcf.tar") : 
  File: trg.snp.pass.vcf.tar does not appear to be a VCF file.
  First line of file:
 trg.snp.pass.vcf.tar 
  Should begin with:
##fileformat=VCFv 
In addition: Warning message:
In scan(file = file, what = character(), nmax = 1, sep = "\n", quiet = TRUE,  :
  embedded nul(s) found in input
>
ADD REPLYlink written 6 months ago by F3.4k

bcftools query plugin and snpsift plugin in galaxy also do that

ADD REPLYlink modified 6 months ago • written 6 months ago by F3.4k
4
gravatar for zx8754
6 months ago by
zx87547.9k
London
zx87547.9k wrote:

Using bcftools:

bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' myFile.vcf > myFileLong.txt
ADD COMMENTlink written 6 months ago by zx87547.9k

Thank you,

says

[fi1d18@cyan01 ~]$ [fi1d18@cyan01 ~]$ bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' trg.snp.pass.vcf > myFileLong.txt
-bash: [fi1d18@cyan01: command not found
[fi1d18@cyan01 ~]$ Failed to open trg.snp.pass.vcf: unknown file type

And when I tried for .vcf for one sample says

[fi1d18@cyan01 ~]$ [fi1d18@cyan01 ~]$ bcftools query -f '[%SAMPLE %CHROM %POS %REF %ALT %GT\n]' LP2000104-DNA_A01_vs_LP2000101-DNA_A01.passed.somatic.indel.vcf > myFileLong.txt
bash: [fi1d18@cyan01: command not found
[fi1d18@cyan01 ~]$ Error: no such tag defined in the VCF header: FORMAT/GT
ADD REPLYlink modified 6 months ago • written 6 months ago by F3.4k
1

bash: [fi1d18@cyan01: command not found

Your command line doesn't start with bcftools. The first thing that is trying to start is [fi1d18@cyan01. Make sure there are no more symbols before the command you like to start.

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer12k

Either provide full path for bcftools or add the directory with that executable to your $PATH. export PATH=$PATH:/dir_for_bcftools

ADD REPLYlink modified 6 months ago • written 6 months ago by genomax70k

Sorry, I am in path but either galaxy or in Linux I am getting this error

Error: no such tag defined in the VCF header: FORMAT/GT

and galaxy says

Fatal error: Exit code 255 ()
Error: no such tag defined in the VCF header: INFO/REFt. FORMAT fields must be in square brackets, e.g. "[ REFt]"

The head of my vcf is this

##bcftools_viewCommand=view -h c.vcf
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR

I don't know what is going wrong in my vcf files though

ADD REPLYlink written 6 months ago by F3.4k

Are you sure that this is the complete header?

bcftools is very strict about the vcf specs. So the first line must be:

##fileformat=VCFv4.1

(Version number can differ)

For each contig you need an entry like this:

##contig=<ID=chr1,length=248956422>

For each key in the INFO and FORMAT column you need in entry in the header. For GT this looks like this:

##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

So, are there more entry in the header?

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer12k
3
gravatar for F
6 months ago by
F3.4k
Iran
F3.4k wrote:
bcftools query -f '%CHROM\t%POS\t%REF\t%ALT[\t%ID]\n' c.vcf

This solve the error

bcftools query plugin and snpsift plugin in galaxy also do that

ADD COMMENTlink modified 4 months ago • written 6 months ago by F3.4k
1

Great it worked out, accept it if this was the solution.

ADD REPLYlink written 6 months ago by zx87547.9k

%END would return the start and end

ADD REPLYlink modified 6 months ago • written 6 months ago by F3.4k
2
gravatar for andrew.j.skelton73
6 months ago by
London
andrew.j.skelton735.8k wrote:

GATK has a tool for that, see VariantsToTable

ADD COMMENTlink written 6 months ago by andrew.j.skelton735.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 500 users visited in the last hour