Entering edit mode
                    5.1 years ago
        MatthewP
        
    
        ★
    
    1.4k
    I have fastq file of ATAC-seq but many reads start with one "N" base, I wonder what could be the reason. 
Reads example:
@A00838:273:HCV7KDSXY:4:1101:1506:1000 1:N:0:AGGCAGAA+TATCCTCT
NATCCAGAAAAAAAAAAAATCATGACCAAGCTTACCGTCCCCACTTAAAT
+
#FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFF
@A00838:273:HCV7KDSXY:4:1101:1687:1000 1:N:0:AGGCAGAA+TATCCTCT
NCATAGATCACATTAAGTACAAATATAAACAGTATTATTTCTTTACAATTGGATGTGTTGGAGACTTACTGATGT
+
#FFFFFF,FFFFFFFFF:FFFFFFFFFF:FFFFFFFFF:FFFF:FFF:FFFFF:FFFF,F,FFFFFFFFFFFFF:
Stats of such kind of reads:
$ zcat ATAC_R1.fq.gz | grep -e "^N" | wc -l
15150
$ zcat ATAC_R2.fq.gz | grep -e "^N" | wc -l
28
15150 and 28 reads of several millions is
many? I would not even care about that tiny percentage to be honest. Just proceed with analysis. The aligner will clip off that one base.