Question: Convert FASTAQ file to VCF or Excel file
0
gravatar for MCC
3.8 years ago by
MCC10
MCC10 wrote:

Hi! I need someone to tell how could I convert a FASTAQ file to a VCF or Excel file? I need to use the raw data from a NGS sequencing.

Thanks!

sequencing • 3.2k views
ADD COMMENTlink modified 15 days ago by pathological0 • written 3.8 years ago by MCC10
4

I'll go ahead and assume you have no experience at all with analyzing NGS data?
Well, your question is a rather long story. You'll first some quality control, adapter trimming, read mapping and variant calling to get a VCF file.
Also, you didn't specify anything about which technology, which organism or what the to be obtained result should be.

Is there someone in your institute who can help you?

ADD REPLYlink written 3.8 years ago by WouterDeCoster44k
1

Start by reading beginning chapters in this book (and then chapter 6) to get a grasp of the task at hand that you have described in one sentence above (like @Wouter I am assuming some things). Then come back and flesh out the question with additional details so people can give you directional guidance.

You may be a computer scientist well versed with command line or a biologist who is a novice at command line. Depending on which category you are in the subsequent ride can be wild or a bit less so.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by GenoMax92k
1
gravatar for EagleEye
3.8 years ago by
EagleEye6.7k
Sweden
EagleEye6.7k wrote:

A: FASTQs to the VCF

ADD COMMENTlink written 3.8 years ago by EagleEye6.7k
1
gravatar for igor
3.8 years ago by
igor11k
United States
igor11k wrote:

In addition to what everyone else has already said, here is a more visual workflow example (it may not be appropriate in your specific situation): enter image description here From: https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS

ADD COMMENTlink written 3.8 years ago by igor11k
1
gravatar for MCC
3.8 years ago by
MCC10
MCC10 wrote:

The question is very easy. I got a FASTAQ file from Illumina technology, we have the software to analyze this file, but we want the raw data from this file. We need a kind of free soft which could let us use that information in Excel. We have experience with NGS, I just want to know if there's an easy way to convert this kind of file. Thanks!

ADD COMMENTlink modified 3.8 years ago • written 3.8 years ago by MCC10
3

Not sure if trolling or just serious.

What is in your opinion the raw data? The rawest would be the image data, the images created by the camera of sequencer each cycle. But getting those in excel is obviously completely pointless.

The next level of data is fastq (as written by @igor). That you already have, and is also pointless to view in excel.

ADD REPLYlink written 3.8 years ago by WouterDeCoster44k
2

+1 , because I laughed. Thanks :-)

ADD REPLYlink written 3.8 years ago by Pierre Lindenbaum131k

You should use ADD COMMENT/ADD REPLY to respond to existing posts rather than posting an "answer" to keep threads logically organized.

To address your question, FASTQ is raw data. FASTQs contain the raw sequencing reads. Data that comes from a FASTQ file is processed data.

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by igor11k

Sorry for not ADD a COMMENT/Reply before...first time making questions this way. I know FASTQ is a format for that data, but we can't open it and see the information. Maybe I'm not so clear, If I not wrong, Fastq contains all variants and all information from the sequencer, well, we want to visualize that information in order to use our own filters. We used the page https://usegalaxy.org/ but I need if there's any desktop soft or another way. Thanks once again.

ADD REPLYlink written 3.8 years ago by MCC10
3

We have experience with NGS

Maybe it's better to be honest. You haven't the finest clue what you are doing or asking for.

Fastq contains all variants

No, fastq contains read information. First you need to map it, then you need to identify variants in it. You can't do either of those things in excel.

ADD REPLYlink written 3.8 years ago by WouterDeCoster44k
2

"Fastq contains all variants" shows that how much you got experience with NGS. I think from your post I can suggest you to consult with an bioinformatician from your institute (please be honest atleast in the public forums so that people will know your level of understanding in particular field and help accordingly)

ADD REPLYlink modified 3.8 years ago • written 3.8 years ago by EagleEye6.7k

If you used Galaxy, then Galaxy analyzed the data for you. Galaxy is a graphical browser-based tool to run various command-line tools. Every step in Galaxy gives you a description that includes information about which tool was used. You can then download that tool and use it without needing Galaxy.

ADD REPLYlink written 3.8 years ago by igor11k
0
gravatar for pathological
15 days ago by
pathological0 wrote:

FASTQ is your raw data... if you want to "visualize" (I think you mean you want to do something called variant calling and annotation afterwards) you need an adapter trimming first, then your reads in the FASTQ file are aligned against a reference genome (e.g. BWA) - so your input is a FASTQ file and your output file is a BAM file. The BAM file is then used by whatever software you use, to do a process called variant calling. So the output file here is a VCF file (VCF stands for Variant Call Format) - you got it now? this file is at the very end of your process... Then get yourself a variant calling tool where you can simply upload the VCF file and set the filters that you want...e.g. variant allele frequency >5%, set your frequencies for ExAC, gnomad etc.. then you end up with some variants that you can now annotate and "visualize"!

hope that helps you a little bit....actually your post is 4 years old so I hope you learned something in that time!

(Slightly redacted by WouterDeCoster to remove inappropriate language)

ADD COMMENTlink modified 14 days ago by WouterDeCoster44k • written 15 days ago by pathological0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1959 users visited in the last hour