Search for specific SNPs in VCF files of patients.
0
0
Entering edit mode
4 months ago
iarmir ▴ 10

I have 490 genomes from 490 patients in VCF format. I created a Multi VCF file from these VCFs. I want to find 2 mutations (Y215C and G325R) in these patients, count the number of patients who have these SNPs and compare them with clinical information. Is there any standard pipeline for such tasks?

Should I annotate first, or can I start searching by coordinates right away? Do I need to create 2 .bed files with the coordinates of the SNP? Like this?

17 63481679 63481680 (chromosome number, beginning and end of SNP)

Next, filter Multi VCF through vcftools (or bcftools) using a .bed file, leaving only the coordinates specified in this .bed file? How? Can you refer to an existing standard pipeline for this task or write down what the pipeline should be? Unfortunately, I could not find an answer to my question earlier

ANNOVAR vcftools bcftools GATK VCF • 786 views
ADD COMMENT
0
Entering edit mode

Is this an assignment or an academic exercise of some sort?

ADD REPLY
0
Entering edit mode

An assignment

ADD REPLY
0
Entering edit mode

What did you try by yourself before you asked the question here?

ADD REPLY
0
Entering edit mode

Searching manually after annotation with ANNOVAR, but this is stupid

ADD REPLY
0
Entering edit mode

It's not stupid, it's a start. By "searching manually", do you mean looking ("eyeballing") or using a "Find and Replace" style tool/command line utility?

You could either do what you've been doing - annotate and then look-up, or you could do the reverse when you find the chromosomal change (CHR, POS, REF and ALT) linked to your variants of interest, create a bed file with those changes and perform some sort of intersect operation.

ADD REPLY
0
Entering edit mode

Thank you. Okay. But I have more questions. I tried to use bcftools to search these mutations. Y215C mutation coordinates 17:63480412 AG rs3730025.

Can't even normalize merged vcd file: "No BGZF EOF marker; file 'merged.vcf.gz' may be truncated. The sequence "chr1 was not found" (so strange)

I also had to index forcibly (bcftools index -f) due to "input probably truncated', after i indexed forcibly:"EOF marker is absent"

Then I tried bcftools view -r 17:63480412-63480412 -Ou merged.vcf.gz | bcftools view -c1 -Ou - | bcftools view -c2 -Ov > Y215C.vcf

And this error again "No BGZF EOF marker; file 'merged.vcf.gz' may be truncated

With bcftools view -r 17:63480412-63480412 merged.vcf.gz > Y215C.vcf situation is the same

I remember when I merged my vcfs with bcftools it were multiple comments from bcftools: the indexes were older than the files due to the fact that the vcf files were overwritten at some point. But in theory, this shouldn't be a problem (?).

What do you think about grep command?

zcat merged.vcf.gz | grep -E '^17\s+63480412\s' > Y215C.vcf

?

ADD REPLY
0
Entering edit mode

Are you sure you bgzipped the VCF and did not just gzip it? What does file /full/path/to/merged.vcf.gz tell you?

ADD REPLY

Login before adding your answer.

Traffic: 1511 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6