Question: DeNovo assembly of transcriptome
0
gravatar for rakeshmbb
3.2 years ago by
rakeshmbb0
India
rakeshmbb0 wrote:

Hi everyone. I am interested to find out microsatellite from publicly available transcriptome data. I want to use sequencing data from SRA archive of NCBI. I am using CLC Bio workbench for processing of data. In the first very step I am having trouble. Is the sequence read of SRA file is adapter trimmed or not. Another problem I faced while using illumina paired data that using SRA tool kit fastqdump it can not split the data into two files.

assembly • 1.6k views
ADD COMMENTlink modified 7 months ago by Biostar ♦♦ 20 • written 3.2 years ago by rakeshmbb0

To check if the sequences are adapter trimmed you can use FastQC tool: http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

ADD REPLYlink written 3.2 years ago by iraun3.5k

Passing the data through a trimming program is not a bad idea. If it was trimmed then it should come through without changes. Only thing that you have to invest in is some time. This should apply to trimming tool in CLC as well.

As for fastq-dump you need to use the --split-files option to get the two reads in two separate files. While you are at it may as well use the --origfmt option to recover the original fastq file headers.
 

ADD REPLYlink written 3.2 years ago by genomax65k
Hi genomax2. Thank you. When I use the fastq-dump --split-files it says rejected 2567436 read because of filtering out non biological reads. and it gives just one fastq file. What could be the problem
ADD REPLYlink written 3.2 years ago by rakeshmbb0

What SRA # are you looking at?

ADD REPLYlink written 3.2 years ago by genomax65k
Hi genomax it is SRR2163549. It is a paired layout data.
ADD REPLYlink written 3.2 years ago by rakeshmbb0

Get the fastq files from EBI: http://www.ebi.ac.uk/ena/data/view/ERR1203908

ADD REPLYlink written 3.2 years ago by genomax65k

Did you check the project metadata? The information if the reads have been trimmed or not may be available there. Also, maybe fastq-dump is not splitting files because it is a single end run?

ADD REPLYlink written 3.2 years ago by h.mon24k
Hi h.Mon.The SRA file is paired layout.can u tell me how to retrieve project metadata
ADD REPLYlink written 3.2 years ago by rakeshmbb0
0
gravatar for h.mon
3.2 years ago by
h.mon24k
Brazil
h.mon24k wrote:

I look at project (and run, sample, etc) metadata at the respective SRA pages, e.g. here for this run. Which version of fastq-dump are you using? Running:

fastq-dump --split-files SRR2163549

gave me just one fastq file, but also the following output:

Rejected 6856409 READS because READLEN < 1
Read 6856409 spots for SRR2163549
Written 6856409 spots for SRR2163549

So I suspect either read 2 was trimmed for some reason, or the metadata is incorrect, or this record is corrupted

ADD COMMENTlink written 3.2 years ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 765 users visited in the last hour