Extracting Subset Of Sequences From Finished Assembly
Entering edit mode
10.1 years ago
erinspice1 • 0


I am trying to submit a subset of sequences (n=10), produced through transcriptome assembly, to GenBank. I've been using these sequences for qPCR, and I need to get them into a database before submitting my manuscript. Apparently I have to do this through the Transcriptome Shotgun Assembly Database and Sequence Read Archive. I'm having trouble getting these sequences in a format that would be acceptable to the SRA (see here: http://www.ncbi.nlm.nih.gov/books/NBK47537/). Right now all I have are the .fasta files.

I have access to our server where the assembly is stored, but our lab's bioinformatics person is unavailable, and I have no computer science background. Everything I've Googled is way over my head. I can run a script if you tell me "this part does this" and "put your filename here", but that's all.

I think that the assembly is a SAM file? There's no extension, so I can't be sure. At any rate, how do I get my subset of sequences out of that assembly and into an acceptable format? I know we have Ruby and samtools, and my (limited) previous work with this assembly has been done through putty.

Can anyone provide me with some really, really basic and dumbed down instructions? Thanks in advance!

samtools assembly • 2.0k views
Entering edit mode

The SRA is for the raw data ie. the huge .fastq files you got from the (likely) Illumina instrument. The "10 assembled transcripts" you mention would not be sent to SRA, they would just be sent to Genbank or ENA like any other derived sequences.


Login before adding your answer.

Traffic: 1379 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6