I am looking at the ENCODE ATAC-seq pipeline and its source code https://github.com/dnanexus-rnd/atac-seq-pipeline
I found that the default of Multimapping reads is 4 (which then further translates to 5, after the m+1 as describe in their code https://github.com/ENCODE-DCC/atac-seq-pipeline/blob/master/src/encode_task_bowtie2.py)
I am also checking the bcbio code, I think they used no multimapping (which should be the default for bowtie2).
I wonder whether we should set the multimapping to 0 instead?
================================
The second question is about the necessity of trimming the adapter prior to the alignment using bowtie2. Someone suggested that these aligners can do soft clipping, therefore I am not sure should we do the trimming.
Similarly, bcbio's tutorial example ymal file does not include any trimming step.
If we are going to need the trimming of the adapter what adapter sequence should we supply? Is it the index sequence?
What is the best practice when analyzing the atac-seq data (I am mapping to mm10 and hg38)?
Thank you!
2 follow-up questions are:
You suggested using
CTGTCTCTTATACACATCT
as the adapter. But when I look at the "NexteraPE-PE.fa provided by Trimmomatic" in the the discussion you direct me to, I find that the adapter sequence is actually in the NexteraPE-PE.fa:Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC' Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
Should we use this "long version" of the adapter sequence? Or purely
CTGTCTCTTATACACATCT
will do the job?You mentioned discarding multimappers. Do you mean you set the bowtie2 -k to 0? Or, you suggest discarding them using samtools? While I am looking at the bowtie2 manual after posting the original question, it seems that we can only know there is multimapping happening when we use the -k option (a number larger than 1). It is because bowtie2 will randomly report 1 aligned region when there is a tie (though, I am not sure whether the XS:i: will still present in the default mode).
Thanks!