Hi everyone. I am interested to find out microsatellite from publicly available transcriptome data. I want to use sequencing data from SRA archive of NCBI. I am using CLC Bio workbench for processing of data. In the first very step I am having trouble. Is the sequence read of SRA file is adapter trimmed or not. Another problem I faced while using illumina paired data that using SRA tool kit fastqdump it can not split the data into two files.
I look at project (and run, sample, etc) metadata at the respective SRA pages, e.g. here for this run. Which version of fastq-dump are you using? Running:
fastq-dump --split-files SRR2163549
gave me just one fastq file, but also the following output:
Rejected 6856409 READS because READLEN < 1 Read 6856409 spots for SRR2163549 Written 6856409 spots for SRR2163549
So I suspect either read 2 was trimmed for some reason, or the metadata is incorrect, or this record is corrupted