Question: quality check of bam files
0
gravatar for gabri.mochales
8 weeks ago by
gabri.mochales30 wrote:

Good afternoon, I'm working with GATK to do the SNP calling of some target capture sequencing data. Right now I'm creating the bam files and I was wondering which are the standard quality measures I should apply to my bam files. I'm aware that I can mark duplicates, but I don't know how this could affect the consequent analyses. Are they removed or they are just marked and I have to do something else? Should I specify some value scores for my bam files?

Thank you very much for your help!

sam bam quality gatk • 195 views
ADD COMMENTlink modified 8 weeks ago by Pierre Lindenbaum129k • written 8 weeks ago by gabri.mochales30
1

Check GATK Best Practices where they address many of your questions.

ADD REPLYlink written 8 weeks ago by igor11k

Like in here ? I don't fully get GATK BEst Practices

ADD REPLYlink written 8 weeks ago by gabri.mochales30
1

They have a lot of different resources. You can also check their events pages, which do not sound like they would be relevant, but they post helpful presentations.

ADD REPLYlink written 8 weeks ago by igor11k
3
gravatar for Pierre Lindenbaum
8 weeks ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum129k wrote:

. Are they removed or they are just marked and I have to do something else?

they are just marked. Removing PCR duplicates - fastq or BAM?

Each tools in GATK have a default set of Filters for BAM. For example HaplotypeCaller https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller

Read filters

These Read Filters are automatically applied to the data by the Engine before processing by HaplotypeCaller.

NotSecondaryAlignmentReadFilter GoodCigarReadFilter NonZeroReferenceLengthAlignmentReadFilter PassesVendorQualityCheckReadFilter MappedReadFilter MappingQualityAvailableReadFilter NotDuplicateReadFilter => https://gatk.broadinstitute.org/hc/en-us/articles/360037592051-NotDuplicateReadFilter MappingQualityReadFilter WellformedReadFilter

and if needed, any filter can be disabled: https://gatk.broadinstitute.org/hc/en-us/articles/360037225632-HaplotypeCaller#--disable-read-filter

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by Pierre Lindenbaum129k
2
gravatar for bruce.moran
8 weeks ago by
bruce.moran830
Ireland
bruce.moran830 wrote:

Yes you should quality check your BAMs using CollectHsMetrics to determine on-target reads which is a good indicator of whether the library preparation worked well. Low on-target reads indicates hybridisation failed.

Picard Tools (now part of GATK) has many other tools you can use to QC DNAseq data (e.g. CollectAlignmentSummaryMetrics, CollectMultipleMetrics, CollectSequencingArtifactMetrics, CollectInsertSizeMetrics etc.)

It is a good idea to run fastp or fastqc on fastq files before alignment to determine quality of data initially.

Finally to gather all output into one place MultiQC is a great tool.

ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by bruce.moran830
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1718 users visited in the last hour