Question

New to the field...how to work with .SRA files?

0

Entering edit mode

7.6 years ago

student5567 • 0

Hi all. Thus far, have figured out how to convert a large .SRA file that was obtained from the NCBI to fastq format--but I have the SRA Toolkit downloaded on my Mac, so according to what I've found online so far, I think that I could just as easily convert this file to ABI SOLiD native, fasta, sff, sam, or Illumina native using similar commands in the Mac Terminal.

My question: what is the best program/approach to go from .SRA data to being able to search for specific SNPs by their rs number? Is Illumina's Genome Studio a useful program? Are there other programs/approaches that would be better?

SNP • 2.6k views

ADD COMMENT • link updated 7.6 years ago by charco ▴ 50 • written 7.6 years ago by student5567 • 0

0

Entering edit mode

In case you run into trouble with SRAtoolkit (which you eventually will) here is a way to avoid it altogether.

For most SRA# (except very recent ones, which will be eventually caught up) you can find the fastq files directly by searching EBI-ENA with the SRA#.

ADD REPLY • link 7.6 years ago by GenoMax 142k

score 1 · Answer 1 · 2016-10-02

Illumina GenomeStudio is for analyzing array data, which is not applicable to your case.

FASTQ format contains the raw sequences. You then need to align them to a reference genome, call variants to find positions that are different from the reference, then annotate the variants with dbSNP info to get rs identifiers.

Check GATK Best Practices for the recommended workflow for this type of analysis: https://software.broadinstitute.org/gatk/best-practices/

That might be a lot of work, but it's worth reading just to be aware. You can also try a graphical solution like Galaxy, Illumina BaseSpace, Seven Bridges, etc.

score 0 · Answer 2 · 2016-10-03

If you are really only interested in SNPs, there are methods which don't require aligning to a reference genome, so called 'reference-free' methods. They will be faster than a full alignment and SNP calling workflow, but the accuracy may be less.

Reading: http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-15-S4-S10 http://nar.oxfordjournals.org/content/early/2014/11/16/nar.gku1187.full.pdf