I'm new to RNAseq data analysis and related bioinformatic pipelines. I just aligned my PE-reads to genome using Tophat2:
tophat2 -p 4 -G ../Arabidopsis_thaliana.TAIR10.31.gtf ../Arabidopsis_thaliana.TAIR10.31.dna.genome PE.reads.1.fastq.gz PE.reads.2.fastq.gz
I then get the default output files: accepted_hits.bam etc.,
Left reads: Input: 37579106 Mapped : 35638921 (94.8% of input) of these: 703952 ( 2.0%) have multiple alignments (23 have >20) Right reads: Input : 37579106 Mapped : 32462213 (86.4% of input) of these: 614654 ( 1.9%) have multiple alignments (23 have >20) 90.6% overall read mapping rate.
Aligned pairs: 31803489 of these: 601459 ( 1.9%) have multiple alignments 5799 ( 0.0%) are discordant alignments 84.6% concordant pair alignment rate.
For some further downstream processing steps of my data I need sam instead of bam. That's why I wanted to use samtools to convert bam to sam (as already described in some threads here). I know that it is also possible to create a sam output by using "--no-convert-bam" but I forgot to add this in my tophat2 run.
So what I did: samtools view -h -o accepted_hits.sam accepted_hits.bam
samtools view: writing to "accepted_hits_sam" failed: File too large samtools view: error closing "accepted_hits.sam": -1
I basically don't know why the file should be too large now. Did anyone face this problem before? Thanks in advance!