Entering edit mode
6.9 years ago
seta
★
1.9k
Hi everybody,
I'm busy with genome-guided transcriptome assembly of some Illumina data from human. I used STAR for read mapping on hg19 and cufflinks for transcriptome assembly. I performed the analysis for two independent datasets, separately (one single end, 36bp and another, paired-end 100 bp). After conversion of "transcripts.gtf" file produced by cufflinks to fasta file, I observed that the count of sequences in fasta files related to the two independent datasets is the same. I was wondering if it is normal or something is wrong?
Thanks in advance
No, it is not normal.
But as we can easily see from the command-line you provided, you run both times with the same dataset.
Never, not running with the same dataset. I checked all commands again. What should I do?!
Actually, the second dataset is those data that two read files of a single paired-end file had various length and I asked about it in this post (enter link description here, and you kindly suggested to remove ftl=20 ftr=90 from the related command of bbduk for read trimming and I did it. However, mapping percentage was almost good, about 82-84% for all samples. What should I do for checking the accuracy of results?