Question

DeNovo assembly of transcriptome

0

Entering edit mode

8.3 years ago

rakeshmbb • 0

Hi everyone. I am interested to find out microsatellite from publicly available transcriptome data. I want to use sequencing data from SRA archive of NCBI. I am using CLC Bio workbench for processing of data. In the first very step I am having trouble. Is the sequence read of SRA file is adapter trimmed or not. Another problem I faced while using illumina paired data that using SRA tool kit fastqdump it can not split the data into two files.

Assembly • 2.8k views

ADD COMMENT • link updated 5.6 years ago by Biostar 20 • written 8.3 years ago by rakeshmbb • 0

0

Entering edit mode

To check if the sequences are adapter trimmed you can use FastQC tool: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

ADD REPLY • link 8.3 years ago by iraun 6.2k

0

Entering edit mode

Passing the data through a trimming program is not a bad idea. If it was trimmed then it should come through without changes. Only thing that you have to invest in is some time. This should apply to trimming tool in CLC as well.

As for fastq-dump you need to use the --split-files option to get the two reads in two separate files. While you are at it may as well use the --origfmt option to recover the original fastq file headers.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 8.3 years ago by GenoMax 141k

0

Entering edit mode

Hi genomax2. Thank you. When I use the fastq-dump --split-files it says rejected 2567436 read because of filtering out non biological reads. and it gives just one fastq file. What could be the problem

ADD REPLY • link 8.3 years ago by rakeshmbb • 0

0

Entering edit mode

What SRA # are you looking at?

ADD REPLY • link 8.3 years ago by GenoMax 141k

0

Entering edit mode

Hi genomax it is SRR2163549. It is a paired layout data.

ADD REPLY • link 8.3 years ago by rakeshmbb • 0

0

Entering edit mode

Get the fastq files from EBI: http://www.ebi.ac.uk/ena/data/view/ERR1203908

ADD REPLY • link 8.3 years ago by GenoMax 141k

0

Entering edit mode

Did you check the project metadata? The information if the reads have been trimmed or not may be available there. Also, maybe fastq-dump is not splitting files because it is a single end run?

ADD REPLY • link 8.3 years ago by h.mon 35k

0

Entering edit mode

Hi h.Mon.The SRA file is paired layout.can u tell me how to retrieve project metadata

ADD REPLY • link 8.3 years ago by rakeshmbb • 0

score 0 · Answer 1 · 2016-01-22

I look at project (and run, sample, etc) metadata at the respective SRA pages, e.g. here for this run. Which version of fastq-dump are you using? Running:

fastq-dump --split-files SRR2163549

gave me just one fastq file, but also the following output:

Rejected 6856409 READS because READLEN < 1
Read 6856409 spots for SRR2163549
Written 6856409 spots for SRR2163549

So I suspect either read 2 was trimmed for some reason, or the metadata is incorrect, or this record is corrupted