quality check of bam files
2
0
Entering edit mode
4.0 years ago
gubrins ▴ 290

Good afternoon, I'm working with GATK to do the SNP calling of some target capture sequencing data. Right now I'm creating the bam files and I was wondering which are the standard quality measures I should apply to my bam files. I'm aware that I can mark duplicates, but I don't know how this could affect the consequent analyses. Are they removed or they are just marked and I have to do something else? Should I specify some value scores for my bam files?

Thank you very much for your help!

gatk bam sam quality • 3.7k views
ADD COMMENT
1
Entering edit mode

Check GATK Best Practices where they address many of your questions.

ADD REPLY
0
Entering edit mode

Like in here ? I don't fully get GATK BEst Practices

ADD REPLY
1
Entering edit mode

They have a lot of different resources. You can also check their events pages, which do not sound like they would be relevant, but they post helpful presentations.

ADD REPLY
3
Entering edit mode
4.0 years ago

. Are they removed or they are just marked and I have to do something else?

they are just marked. Removing PCR duplicates - fastq or BAM?

Each tools in GATK have a default set of Filters for BAM. For example HaplotypeCaller https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by HaplotypeCaller.

NotSecondaryAlignmentReadFilter GoodCigarReadFilter NonZeroReferenceLengthAlignmentReadFilter PassesVendorQualityCheckReadFilter MappedReadFilter MappingQualityAvailableReadFilter NotDuplicateReadFilter => https://gatk.broadinstitute.org/hc/en-us/articles/360037592051-NotDuplicateReadFilter MappingQualityReadFilter WellformedReadFilter

and if needed, any filter can be disabled: https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--disable-read-filter

ADD COMMENT
2
Entering edit mode
4.0 years ago
bruce.moran ▴ 960

Yes you should quality check your BAMs using CollectHsMetrics to determine on-target reads which is a good indicator of whether the library preparation worked well. Low on-target reads indicates hybridisation failed.

Picard Tools (now part of GATK) has many other tools you can use to QC DNAseq data (e.g. CollectAlignmentSummaryMetrics, CollectMultipleMetrics, CollectSequencingArtifactMetrics, CollectInsertSizeMetrics etc.)

It is a good idea to run fastp or fastqc on fastq files before alignment to determine quality of data initially.

Finally to gather all output into one place MultiQC is a great tool.

ADD COMMENT

Login before adding your answer.

Traffic: 2548 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6