Plot number of mutations in each cancer type in VCF files?
2
0
Entering edit mode
4.4 years ago
DanielC ▴ 150

Dear Friends,

I am trying to plot "number of variants in each cancer type in vcf files". Could you please let me know how to do this using R/python or bash? I have a text file of the samples and the cancer type associated with it, like this below:

Samples                         Cancer Type
TCGA-XXX.barcode                ACC
.
.


I am new to this and learning. Thank much!

DK

vcf SNP plot • 2.4k views
0
Entering edit mode

It is unclear which data you have - be specific, e.g. number of samples - and which type of plot you aim to obtain. Please elaborate and show an example.

0
Entering edit mode

Thanks! It is tcga cancer data vcf files. These are merged vcf files of about 10000 samples for each chromosome. I do not know the name of the plot but am looking for is a plot of "number of variants in each cancer type in the vcf files". Please let me know if am clear and what you think could be done to obtain this? Thanks much!

0
Entering edit mode

Can you perhaps draw the plot you have in mind on a piece of paper, take a picture and show us? How should the 10k samples be summarized?

0
Entering edit mode

Thanks for your reply! This is the type of plot I am looking to generate from VCF files: https://www.dropbox.com/s/rfvyw8b8v62lhuz/example.jpg?dl=0

Please let me know how to generate this plot from vcf files. Thanks

0
Entering edit mode

How do you link the sample identifiers in the vcf to the cancer types?

0
Entering edit mode

Thanks! I will remember that next time.

To link the identifiers to the cancer type I have a text file like this:

Sample                           Cancer Type
TCGA-XXX-barcode      ACC


Now, am trying to figure out how to use this information to plot "number of variants for each cancer type" using this above file and the merged vcf file for 10000 samples. Please let me know if am missing any info here, I will provide them. Thanks.

1
Entering edit mode

Also that file is important information which should have been part of your first post. We have now wasted 11 hours until we found all required information.

I can solve this in Python, but not in R.

0
Entering edit mode

Thanks, I have updated the question. Could you please share with me how we can plot such a plot with these available info? I would really appreciate. Thanks.

4
Entering edit mode
4.4 years ago

I wrote a script which should be able to handle this.

The script below takes two arguments:

• --samples: your list of samples with their cancer type. No header, just a (space) separated file with two columns.
• --vcf: your vcf file, for which all sample names are found in the file specified by --samples

This script requires cyvcf2 and matplotlib which you can install from pip:

pip install -U cyvcf2
pip install -U matplotlib


Save the code, e.g. as samples_to_hist.py and execute as (fill in the proper files)

python samples_to_hist.py --samples samples.txt --vcf variants.vcf


Please let me know how it goes.

0
Entering edit mode

Thanks much! please clarify these queries:

a) my samples.txt has 8000 list and vcf file has 10389 samples. Will the program run in this scenario?

b) can the program be made to run on vcf.gz files?

Thanks much!

1
Entering edit mode

a) No it will explicitly fail because it encountered samples in the vcf which were not in samples.txt (function test_vcf, line 29).

b) It already supports vcf.gz files ;-)

0
Entering edit mode

Thanks ! is it possible to modify the program for scenario a) where the number of samples in samples.txt is less than the number of samples in vcf file. :-)

0
Entering edit mode

Yeah that's possible but I don't know when I'll have time to adapt that. Or you could adapt your input file.

0
Entering edit mode

Thanks for the reply! Since the number of samples is 10389 in the vcf file, I will have to find the unmatched ones from samples.txt and then delete those fields from the vcf file which has a huge size so will take lot of time. If possible, I would really appreciate if modifications could be made in the script. :-) Thanks.

0
Entering edit mode

bcftools can do what you are looking for

0
Entering edit mode

Great! am learning useful stuff here. Could you please guide me to the source on how to do this using bcftools. Thanks much!

0
Entering edit mode

Have you gone through the manual?

0
Entering edit mode

Yes, please let me know if this is right:

bcftools -S samples-to-remove.txt XX.vcf.gz > filtered-vcf.vcf

samples-to-remove.txt:
^TCGA...1
^TCGA...2


. .

4
Entering edit mode

Whether that command works right should be really easy to confirm on your own. I think I did enough here - time for you to show some effort too.

0
Entering edit mode
10 days ago
cocchi.e89 ▴ 230

Take a look at this: plot-VCF

It allows you to plot any flag, sample, divide cases and controls, operate gene-analysis etc. (well documented on the GitHub page)