WES data analysis with UMI barcode
2
0
Entering edit mode
2.6 years ago
Nickier ▴ 20

Hi ~ I get a WES data set, each sample contains 3 fastq files, 1.fastq.gz, 2.fastq.gz and *umi.fastq.gz, as shown in the figure below. I know the workflow of WES data, but I don't know what the UMI file does. Whether there is a related tutorial? Thank you~

enter image description here

UMI WES • 1.6k views
ADD COMMENT
4
Entering edit mode
2.6 years ago

A standard pipeline would be

  1. Attach UMIs from *umi.fastq.gz to the read1 and read2 fastqs (umi_tools extract can do this)
  2. Map with your favourite read aligner
  3. Sort and Index BAM file
  4. Deduplicate with a UMI aware deduplicator (e.g. umi_tools dedup).
  5. Follow same pipeline as with non-UMI WES data (e.g. BQSR and HaplotypeCaller)

See http://umi_tools.readthedocs.org

ADD COMMENT
0
Entering edit mode

Thanks a lot ~ I will have a try based on your answer.

ADD REPLY
0
Entering edit mode

Thank you very much. I run the pipeline you mentioned and got the vcf result. Comparing the results that did not include the umi fastq file, it was found that there were fewer mutations, and the inspection in igv found that most of the reduced mutations were related to duplicated reads.

ADD REPLY
0
Entering edit mode
2.6 years ago

Depending on the library prep, two reads which share the same umi and the same mapping position are likely to be duplicates of the same original molecule, and shouldn't be recounted. You'll want to find software that can read your bam, spot duplicates with the help of the umi, and remove excess reads.

ADD COMMENT
0
Entering edit mode

Thank you for your reply. I know the principle of UMI, and I also know the non-UMI WES data workflow: BWA-MarkDuplicates-BQSR-Call Variant, but I don't know how to analyze UMI-WES data, such as how to MarkDuplicates. What's more, the second base is N in the sample_1.fastq.gz . Why? Could it be related to UMI?

ADD REPLY

Login before adding your answer.

Traffic: 2665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6