Hi everyone, I'm new to querying public data, so bear with me. I am interested in chip-seq reads mapping to a specific region of the genome and I want to quickly query lots of different data sets for this region.
From googling around this is the method I came up with. I tried it first on a positive control region which should have lots of mapped reads according to the paper. I get zero reads mapped.
Here are my commands:
/#for prdm1 on H3K27ac in day 6 pgc:
sam-dump --aligned-region 10:44170511-44216948 --output-file Prdm1_SRR1539456.sam SRR1539456
/#convert sam to bam:
samtools view -S -b Prdm1_SRR1539456.sam > Prdm1_SRR1539456.bam
/#count the number of mapped reads:
samtools flagstat Prdm1_SRR1539456.bam
When I call flagstat this is what I get: 115516081 + 0 in total (QC-passed reads + QC-failed reads) 0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
0 + 0 paired in sequencing
0 + 0 read1
0 + 0 read2
0 + 0 properly paired (N/A : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (N/A : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
The file Prdm1_SRR1539456.bam is 4.2G so it is not empty... what am I doing wrong? I'm really stumped.
Thanks in advance!