Question: Shortcuts for retrieving information on specific genomic loci from big SRA files
gravatar for Anima Mundi
4.2 years ago by
Anima Mundi2.4k
Anima Mundi2.4k wrote:


recently I have been trying to get some information regarding the gene structure of a few mouse genes. All I have is an RNA-Seq-derived SRA raw file. I used it to generate a FASTQ file using fastq-dump from SRA Toolkit. I then run tophat from TopHat2/Bowtie2 to align FASTQ reads to genomic reference. Unfortunately tophat turned out to be extremely slow on the machine I am using. However, since what I actually need is just to check a few loci, I would like to know if there are alternative solutions. For example, could I use single FASTA files (retrieved from Ensembl for each locus) to generate small, locus-specific indexes?


bowtie rna-seq tophat sra genome • 1.2k views
ADD COMMENTlink modified 3.0 years ago by i.sudbery5.2k • written 4.2 years ago by Anima Mundi2.4k
gravatar for i.sudbery
3.0 years ago by
Sheffield, UK
i.sudbery5.2k wrote:

No. That would probably be a bad idea, because a read might be mapped with mismatches to your small index where it would be mapped perfectly elsewhere.

Much of the recent SRA is referenced compressed, so if you are lucky, you might be able to just retrieve reads from your area of interest. Alternatively, TopHat2 is one of the slower RNA-seq mappers. Try using something faster, like HISAT2, or STAR.

ADD COMMENTlink written 3.0 years ago by i.sudbery5.2k

Thanks. That time I did not manage to extract the information I wanted with the with the computing power I had. Next time I will try my luck with faster algorithms like those you suggest (or with better machines!).

ADD REPLYlink written 3.0 years ago by Anima Mundi2.4k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1077 users visited in the last hour