ATACseq Data Insert Size Plot
0
0
Entering edit mode
3 days ago

Hi,I'm trying to follow the atacseq analysis workshop by NBIS. I directly download the bam file from ENCODE by

wget https://www.encodeproject.org/files/ENCFF045OAB/@@download/ENCFF045OAB.bam

I check the header,the file was produced by

@PG     ID:bowtie2      PN:bowtie2      VN:2.3.4.3      CL:"/software/bowtie2-align-s --wrapper basic-0 -k 5 -X2000 --mm --threads 6 -x GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta -1 R1/ENCFF369XMA.trim.fastq.gz -2 R2/ENCFF997TXI.trim.fastq.gz"

I make a file that fits with workshop's statement "alignments were subset to include chromosome 14 and 1% of reads mapped to chromosomes 1 to 6 and chrM."

$SAMTOOLS view -hb -@ 3 bam/ENCFF045OAB.bam -o subset1/chr14/ENCFF045OAB_chr14.bam chr14

$SAMTOOLS view -hb -@ 3 bam/ENCFF045OAB.bam -o subset1/sample/ENCFF045OAB_other.bam chr1 chr2 chr3 chr4 chr5 chr6 chrM

$SAMTOOLS index subset1/chr14/ENCFF045OAB_chr14.bam
$SAMTOOLS index subset1/sample/ENCFF045OAB_other.bam

$SAMTOOLS view -s 101.01 subset1/sample/ENCFF045OAB_other.bam -b -o subset1/sample/ENCFF045OAB_downsample_1-6_M.bam

$SAMTOOLS merge -o subset1/ENCFF045OAB.chr14.bam subset1/chr14/ENCFF045OAB_chr14.bam subset1/sample/ENCFF045OAB_downsample_1-6_M.bam
$SAMTOOLS index subset1/ENCFF045OAB.chr14.bam

Then I filter the file with following key code,

$BAMUTILS filter subset1/ENCFF045OAB.chr14.bam subset1/ENCFF045OAB_rmblack_chr14.bam -excludebed ./ref/ENCFF356LFX.bed nostrand

$SAMTOOLS index subset1/ENCFF045OAB_rmblack_chr14.bam

$BAMUTILS filter subset1/ENCFF045OAB_rmblack_chr14.bam subset1/ENCFF045OAB_mtRM_rmblack_chr14.bam -excluderef chrM

$SAMTOOLS view -f 2 -q 5 -hb -o subset1/ENCFF045OAB_mapq5_mtRM_rmblack_chr14.bam subset1/ENCFF045OAB_mtRM_rmblack_chr14.bam

$SAMTOOLS index subset1/ENCFF045OAB_mapq5_mtRM_rmblack_chr14.bam

$PICARD MarkDuplicates    I=subset1/ENCFF045OAB_mapq5_mtRM_rmblack_chr14.bam O=processedbam/ENCFF045OAB_rmdup_mapq5_mtRM_rmblack_chr14.bam \
              M=processedbam/ENCFF045OAB.rmdup.metrics \
               REMOVE_DUPLICATES=true \
                  ASO=coordinate

After filtering,when i try to plot insertsize by picard,a blue region ranging below about 100bp occurs.

picard CollectInsertSizeMetrics I= ENCFF045OAB_rmdup_mapq5_mtRM_rmblack_chr14.bam O=test.stats H=test2.pdf M=0.05

enter image description here

It seems that there are "RF"/outward reads in my bam file.I check the "test.stats" and find the "READ_PAIRS" column is "260038 RF " and " 1180113 FR ".Besides, command "samtools flagstats ENCFF045OAB_rmdup_mapq5_mtRM_rmblack_chr14.bam" gets an entry with "1440151 + 0 read2", so picard may use read2 for collect insertsize.Another command agrees with that."721643 + 718508 = 1440151"

$ samtools view ENCFF045OAB_rmdup_mapq5_mtRM_rmblack_chr14.bam |cut -f 2|sort | uniq -c

 721643 147
 718508 163
 103360 323
 118153 339
 118610 355
 103156 371
 103152 387
 118611 403
 118155 419
 103362 435
 718509 83
 721644 99

So the question is that if picard only use 2nd strand and primary alignment(147 +163) for cauculate insertsize why is there blue region? According to the definition,they should all be "FR".

Set "M=0.5" can make the plot beautiful because "RF" reads are removed,but they are still alive.

I have searched the topic and find a same post,but i am still confused how does it happen.

When I try to keep only good pair reads by "-F 256",I get the same blue region as expected(picard still choose flags "147 163" for plot).But if the blue region indicates outward reads,how does picard define it?

$ samtools view -F 256 ENCFF045OAB_rmdup_mapq5_mtRM_rmblack_chr14.bam -o good.bam
$ picard CollectInsertSizeMetrics I=good.bam O=test.stats H=test5_good.pdf M=0.05

enter image description here

InsertSize ATAC-seq Picard • 151 views
ADD COMMENT

Login before adding your answer.

Traffic: 1637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6