vcf to MAF to fasta
2
2
Entering edit mode
6.8 years ago
natasha.sernova ★ 3.8k

Dear all,

A strange idea has came to my mind recently.

There is no direct way to convert vcf into fasta. But

I've read that through Galaxy I can convert vcf to maf (multiple align),

and then it may be possible to convert maf to fasta.

How much information from the original vcf-file will I loose using this way,

and is it possible to avoid the loss?

THANK YOU!

Natasha

Galaxy vcf • 4.0k views
ADD COMMENT
3
Entering edit mode
6.8 years ago

"There is no direct way to convert vcf into fasta" : there is. The GATK provides a tool named FastaAlternateReferenceMaker http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_fasta_FastaAlternateReferenceMaker.html

ADD COMMENT
0
Entering edit mode
Yes, that's true. I've even used it once.

But this option killed me.

-L input.intervals \
I still haven't known (if I have a particular
vcf-file number 1), how and where
to find the corresponding interval coordinates.

You definitely know. PLEASE, help me!

Thank you very much indeed!

Natasha

ADD REPLY
0
Entering edit mode

is -L required ?

ADD REPLY
0
Entering edit mode

Dear Pierre,

this is optional parameter. But when I omit it,  I have just the full fasta-fail for corresponding chromosome, nothing else. I thought that GATK will allow me to cut a fragment corresponding.to vcf-file. But I have to give it "one or more genomic intervals over which to operate". It seems much more complicated. How to find these genomic intervals?

I have vcf-files, refs and chromosome sequences. I made fai-files, dict-files, but it still doesn't give me any hints to the fasta alignment. What else should be done?

Many thanks for your help!

Natasha

 

ADD REPLY
1
Entering edit mode

"one or more genomic intervals over which to operate": i don't understand your problem. You can provide a BED file or even a VCF file: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_CommandLineGATK.html#--intervals

ADD REPLY
0
Entering edit mode

Great. And this will work??? " In order to perform the analysis

at specific positions based on the records present in the file

(e.g. -L file.vcf)" - this is exactly what I need. I will try, that's a miracle.

I was very poor in reading the manual... THANK YOU, Pierre!

ADD REPLY
0
Entering edit mode

I guess I missed the goal of OPs question, I just was not sure what they wanted in the fasta file. I presumed that they didn't have a preexisting fasta file with their reference sequences.
 

ADD REPLY
0
Entering edit mode
6.8 years ago
pld 4.9k

MAF is Mutation Annotation Format, not a multiple alignment:https://wiki.nci.nih.gov/display/TCGA/Mutation+Annotation+Format+%28MAF%29+Specification+-+v2.4

Fasta is really a flat sequence data format, you might be able to store the variants found in a sequence in the fasta header, but I'm not sure what you're asking for makes sense.

ADD COMMENT
1
Entering edit mode

The Multiple Alignment Format (MAF) has been used by UCSC/EnsEMBL etc for ten years or so. It is one of the most widely used multi-alignment formats. The first spec for the Mutation Annotation Format was only released two years ago.

ADD REPLY
0
Entering edit mode

https://wiki.nci.nih.gov/dosearchsite.action?queryString=Multiple+alignment+format
So my question doesn't make sence at all. OK, thank you.

 

ADD REPLY
0
Entering edit mode

That link just leads me to Mutation Annotation Format.

ADD REPLY

Login before adding your answer.

Traffic: 1394 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6