Question: Extracting Subset Of Sequences From Finished Assembly
0
gravatar for erinspice1
5.0 years ago by
erinspice10
Canada
erinspice10 wrote:

Hi,

I am trying to submit a subset of sequences (n=10), produced through transcriptome assembly, to GenBank. I've been using these sequences for qPCR, and I need to get them into a database before submitting my manuscript. Apparently I have to do this through the Transcriptome Shotgun Assembly Database and Sequence Read Archive. I'm having trouble getting these sequences in a format that would be acceptable to the SRA (see here: http://www.ncbi.nlm.nih.gov/books/NBK47537/). Right now all I have are the .fasta files.

I have access to our server where the assembly is stored, but our lab's bioinformatics person is unavailable, and I have no computer science background. Everything I've Googled is way over my head. I can run a script if you tell me "this part does this" and "put your filename here", but that's all.

I think that the assembly is a SAM file? There's no extension, so I can't be sure. At any rate, how do I get my subset of sequences out of that assembly and into an acceptable format? I know we have Ruby and samtools, and my (limited) previous work with this assembly has been done through putty.

Can anyone provide me with some really, really basic and dumbed down instructions? Thanks in advance!

assembly samtools • 1.3k views
ADD COMMENTlink modified 4.9 years ago by Biostar ♦♦ 20 • written 5.0 years ago by erinspice10
1

The SRA is for the raw data ie. the huge .fastq files you got from the (likely) Illumina instrument. The "10 assembled transcripts" you mention would not be sent to SRA, they would just be sent to Genbank or ENA like any other derived sequences.

ADD REPLYlink written 5.0 years ago by Torst900
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour