Question: New to the field...how to work with .SRA files?
0
gravatar for student5567
3.1 years ago by
student55670 wrote:

Hi all. Thus far, have figured out how to convert a large .SRA file that was obtained from the NCBI to fastq format--but I have the SRA Toolkit downloaded on my Mac, so according to what I've found online so far, I think that I could just as easily convert this file to ABI SOLiD native, fasta, sff, sam, or Illumina native using similar commands in the Mac Terminal.

My question: what is the best program/approach to go from .SRA data to being able to search for specific SNPs by their rs number? Is Illumina's Genome Studio a useful program? Are there other programs/approaches that would be better?

snp • 1.2k views
ADD COMMENTlink modified 3.0 years ago by charco50 • written 3.1 years ago by student55670

In case you run into trouble with SRAtoolkit (which you eventually will) here is a way to avoid it altogether.

For most SRA# (except very recent ones, which will be eventually caught up) you can find the fastq files directly by searching EBI-ENA with the SRA#.

ADD REPLYlink written 3.1 years ago by genomax73k
1
gravatar for igor
3.1 years ago by
igor8.6k
United States
igor8.6k wrote:

Illumina GenomeStudio is for analyzing array data, which is not applicable to your case.

FASTQ format contains the raw sequences. You then need to align them to a reference genome, call variants to find positions that are different from the reference, then annotate the variants with dbSNP info to get rs identifiers.

Check GATK Best Practices for the recommended workflow for this type of analysis: https://software.broadinstitute.org/gatk/best-practices/

That might be a lot of work, but it's worth reading just to be aware. You can also try a graphical solution like Galaxy, Illumina BaseSpace, Seven Bridges, etc.

ADD COMMENTlink written 3.1 years ago by igor8.6k
0
gravatar for charco
3.0 years ago by
charco50
charco50 wrote:

If you are really only interested in SNPs, there are methods which don't require aligning to a reference genome, so called 'reference-free' methods. They will be faster than a full alignment and SNP calling workflow, but the accuracy may be less.

Reading: http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-S4-S10 http://nar.oxfordjournals.org/content/early/2014/11/16/nar.gku1187.full.pdf

ADD COMMENTlink written 3.0 years ago by charco50
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1241 users visited in the last hour