Can some one help me with the following? Are there any pipelines that one would like to suggest on the other hand that I can refer to?
Using: Tophat2, Bowtie2
- I have multiplexed single end sequencing data with 40bp reads and I would like to perform differential analysis on this.
- I am using the index files from Tophat index & annotation downloads for UCSC mouse mm10
I use the following Tophat command to do the alignment
tophat -p 4 -N 3 --no-coverage-search --read-edit-dist 3 --output-dir <path> /Bowtie2Index/genome <input_path>
I would like to know how can I improve the alignment using tophat. I seem to have the following statistics on alignment
`3208092 reads; of these:` 3208092 (100.00%) were unpaired; of these: 724883 (22.60%) aligned 0 times 1845395 (57.52%) aligned exactly 1 time 637814 (19.88%) aligned >1 times 77.40% overall alignment rate`
This averall alignment percentage varies from 75% to 80% for different samples. Is this normal? I only wonder how to account for this 20% - 25% of reads. Increasing the number of mismatches could be one option but any suggestions on that?
Note: I also checked for alignment against phix controls using Bowtie, which is less than 0.15%(in all samples).
Can you also comment on the number of reads (depth) that is required for a good DE gene analysis for a single sample library (say with 3 bio-replicates and 2 tech replicates for each condition) or size of each sample library?.
Anything else, that you think I am missing out or consider?
Thank you very much!