Convert FASTAQ file to VCF or Excel file
4
0
Entering edit mode
4.5 years ago
MCC ▴ 10

Hi! I need someone to tell how could I convert a FASTAQ file to a VCF or Excel file? I need to use the raw data from a NGS sequencing.

Thanks!

sequencing • 4.0k views
4
Entering edit mode

I'll go ahead and assume you have no experience at all with analyzing NGS data?
Well, your question is a rather long story. You'll first some quality control, adapter trimming, read mapping and variant calling to get a VCF file.
Also, you didn't specify anything about which technology, which organism or what the to be obtained result should be.

1
Entering edit mode

Start by reading beginning chapters in this book (and then chapter 6) to get a grasp of the task at hand that you have described in one sentence above (like @Wouter I am assuming some things). Then come back and flesh out the question with additional details so people can give you directional guidance.

You may be a computer scientist well versed with command line or a biologist who is a novice at command line. Depending on which category you are in the subsequent ride can be wild or a bit less so.

1
Entering edit mode
4.5 years ago
EagleEye 7.0k
1
Entering edit mode
4.5 years ago
igor 12k

In addition to what everyone else has already said, here is a more visual workflow example (it may not be appropriate in your specific situation): From: https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS

1
Entering edit mode
4.5 years ago
MCC ▴ 10

The question is very easy. I got a FASTAQ file from Illumina technology, we have the software to analyze this file, but we want the raw data from this file. We need a kind of free soft which could let us use that information in Excel. We have experience with NGS, I just want to know if there's an easy way to convert this kind of file. Thanks!

3
Entering edit mode

Not sure if trolling or just serious.

What is in your opinion the raw data? The rawest would be the image data, the images created by the camera of sequencer each cycle. But getting those in excel is obviously completely pointless.

The next level of data is fastq (as written by @igor). That you already have, and is also pointless to view in excel.

2
Entering edit mode

+1 , because I laughed. Thanks :-)

0
Entering edit mode

You should use ADD COMMENT/ADD REPLY to respond to existing posts rather than posting an "answer" to keep threads logically organized.

To address your question, FASTQ is raw data. FASTQs contain the raw sequencing reads. Data that comes from a FASTQ file is processed data.

0
Entering edit mode

Sorry for not ADD a COMMENT/Reply before...first time making questions this way. I know FASTQ is a format for that data, but we can't open it and see the information. Maybe I'm not so clear, If I not wrong, Fastq contains all variants and all information from the sequencer, well, we want to visualize that information in order to use our own filters. We used the page https://usegalaxy.org/ but I need if there's any desktop soft or another way. Thanks once again.

3
Entering edit mode

We have experience with NGS

Maybe it's better to be honest. You haven't the finest clue what you are doing or asking for.

Fastq contains all variants

No, fastq contains read information. First you need to map it, then you need to identify variants in it. You can't do either of those things in excel.

2
Entering edit mode

"Fastq contains all variants" shows that how much you got experience with NGS. I think from your post I can suggest you to consult with an bioinformatician from your institute (please be honest atleast in the public forums so that people will know your level of understanding in particular field and help accordingly)

0
Entering edit mode

If you used Galaxy, then Galaxy analyzed the data for you. Galaxy is a graphical browser-based tool to run various command-line tools. Every step in Galaxy gives you a description that includes information about which tool was used. You can then download that tool and use it without needing Galaxy.

0
Entering edit mode
8 months ago

FASTQ is your raw data... if you want to "visualize" (I think you mean you want to do something called variant calling and annotation afterwards) you need an adapter trimming first, then your reads in the FASTQ file are aligned against a reference genome (e.g. BWA) - so your input is a FASTQ file and your output file is a BAM file. The BAM file is then used by whatever software you use, to do a process called variant calling. So the output file here is a VCF file (VCF stands for Variant Call Format) - you got it now? this file is at the very end of your process... Then get yourself a variant calling tool where you can simply upload the VCF file and set the filters that you want...e.g. variant allele frequency >5%, set your frequencies for ExAC, gnomad etc.. then you end up with some variants that you can now annotate and "visualize"!

hope that helps you a little bit....actually your post is 4 years old so I hope you learned something in that time!

(Slightly redacted by WouterDeCoster to remove inappropriate language)