Question: quality check of bam files
0
gravatar for gabri.mochales
5 months ago by
gabri.mochales30 wrote:

Good afternoon, I'm working with GATK to do the SNP calling of some target capture sequencing data. Right now I'm creating the bam files and I was wondering which are the standard quality measures I should apply to my bam files. I'm aware that I can mark duplicates, but I don't know how this could affect the consequent analyses. Are they removed or they are just marked and I have to do something else? Should I specify some value scores for my bam files?

Thank you very much for your help!

sam bam quality gatk • 355 views
ADD COMMENTlink modified 5 months ago by Pierre Lindenbaum131k • written 5 months ago by gabri.mochales30
1

Check GATK Best Practices where they address many of your questions.

ADD REPLYlink written 5 months ago by igor11k

Like in here ? I don't fully get GATK BEst Practices

ADD REPLYlink written 5 months ago by gabri.mochales30
1

They have a lot of different resources. You can also check their events pages, which do not sound like they would be relevant, but they post helpful presentations.

ADD REPLYlink written 5 months ago by igor11k
3
gravatar for Pierre Lindenbaum
5 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum131k wrote:

. Are they removed or they are just marked and I have to do something else?

they are just marked. Removing PCR duplicates - fastq or BAM?

Each tools in GATK have a default set of Filters for BAM. For example HaplotypeCaller https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by HaplotypeCaller.

NotSecondaryAlignmentReadFilter GoodCigarReadFilter NonZeroReferenceLengthAlignmentReadFilter PassesVendorQualityCheckReadFilter MappedReadFilter MappingQualityAvailableReadFilter NotDuplicateReadFilter => https://gatk.broadinstitute.org/hc/en-us/articles/360037592051-NotDuplicateReadFilter MappingQualityReadFilter WellformedReadFilter

and if needed, any filter can be disabled: https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--disable-read-filter

ADD COMMENTlink modified 5 months ago • written 5 months ago by Pierre Lindenbaum131k
2
gravatar for bruce.moran
5 months ago by
bruce.moran860
Ireland
bruce.moran860 wrote:

Yes you should quality check your BAMs using CollectHsMetrics to determine on-target reads which is a good indicator of whether the library preparation worked well. Low on-target reads indicates hybridisation failed.

Picard Tools (now part of GATK) has many other tools you can use to QC DNAseq data (e.g. CollectAlignmentSummaryMetrics, CollectMultipleMetrics, CollectSequencingArtifactMetrics, CollectInsertSizeMetrics etc.)

It is a good idea to run fastp or fastqc on fastq files before alignment to determine quality of data initially.

Finally to gather all output into one place MultiQC is a great tool.

ADD COMMENTlink modified 5 months ago • written 5 months ago by bruce.moran860
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1684 users visited in the last hour