I have just started to analyse my data. As it seems it's not that easy to start bioinfomatics with this kind of data. I have been analysing methylation reads but for some reason samtools is giving me this error:
[samopen] SAM header is present: 5724 sequences. [sam_read1] reference '256' is recognized as '*'. Parse error at line 5732: unmatched CIGAR operation Aborted (core dumped)
From the beginning: The fastq file has all the reads from the run and I extract only the reads that start with ''CCGG'' because of the MspI restriction enzyme. Please correct me if any of the commands I am using is wrong.
cat bsq_S1_R1_001.fastq | paste - - - - | awk -F '\t' '(substr($2,1,4)=="CCGG")' | tr "\t" "\n" > bsq_tagged.fastq
After this step, every read in the new fastq file starts with CCGG and looks like this:
@NB501461:9:HGFFTBGX2:1:11101:22748:1061 1:N:0:CGATGT CCGGGATCCACCTTTACACACCCTCATCCAAGCATGTCAAACTCGGCGCCCGTCTCTGCCGCGTCCACCAGGGA + AAAAA#EEEEEEEEEEEAE/EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
So I trim this fastq file with trim_galore --rrbs and -a '3' adapter_sequence' to bsq_tagged_trimmed.fq
I align the bsq_tagged_trimmed.fq to the reference genome using bowtie2-2.3.1 and I get the bsq_tagged_trimmed.sam file. The .sam file has 5724 headers and then all the aligned scaffolds with the scaffold positions in a format like this:
@HD VN:1.0 SO:unsorted @SQ SN:Scaffold LN:13623339 . . and so on
Then I use
samtools view -b -S -o out.bam in.sam and I get the error.
I have been using for some weeks now these parameters for alignment and trimming and I never had any problem. Thing changed: Since I update the bowtie2 to version 2.3.1 everything is going bad. I also use samtools 0.1.19 but when I ran the example files from bowtie2 directory, samtools works fine. Could this be compatibility issue of trim_galore and cutadapt with the latest bowtie2 version? How can I see the line 5732 that has unmatched CIGAR operation?