The following is a very loud brainstorming for a beginner
A custom library prepration protocol, which uses 3' poly T tail primer and 5' random primer to enrich for 3' ends. and then eventually adds the illumina adaptersat the the end...and performs a Illumina HTSEq single end sequencing... The rationale is that the group is not interested in splice variants but merely in quantification of the expression for differential analysis. (organism: mouse)
I was wondering what would be the pipeline that one would use... in this case!
The reads are of length 95 and I trim the barcodes and adapters away and a fine stretch of 35 bases usually when mapped using bowtie yields 70% total reads aligned to transcriptome.
bowtie -v 3 --best -t ~/RNAseq/RefLibIndex/mouse -p 10 -f <inputfile.fasta> <output.map>
My output file contains entries in the following format with refseqID.
After I extract the raw read counts thru a custome script from the bowtie output file and I map the refseqID to genesymbol and extract the average/max for each genesymbol as raw counts and perform DESeq differential analysis.
I would like to know:
- Is this approach valid, do u see any loop holes here?
- To visualise this data in a genome browser to check for 3' enrichment. However, I see that there is no chromosome or index information in the file. I tried to get some information Visualizing Bowtie Output In Genome Browser. However, i couldnt succeed at that... any suggestions will be great... something like the following for visualization.. This one uses IGV
- Is there any reason why only 70% is mapped to transcriptome, if so, do u think it is normal for this approach, do you think I should also try to align the sequences to genome?