Undo alignment using bamtofastq, after dedup
2
0
Entering edit mode
9.6 years ago

I'm a newcomer to the sequencing world. From another lab, I have a large (full human genome) mate-pair BAM file produced from the following steps:

1. Trimming of reads to 30bp for PHRED scores under 20 (software unknown).

2. Alignment against GRCh37 with BWA 0.5.8a.

3. De-duplication with GATK 1.0.4.

4. Local realignment around known indels and base score recalibration with GATK 1.0.4.

5. Picard's FixMateInformation (version unknown).

I want to realign the reads against GRCh38 using newer software; in other words, I want to undo steps 1–5, or at least 2–5.

Will SamTools bamtofastq handle this correctly? Specifically, it seems that de-duplication using an alignment against GRCh37 (step 3) permanently changed the BAM by removing reads that might be aligned differently against GRCh38. Since the only command from GATK I could find for de-duplication is MarkDuplicates, which doesn't delete any reads, I will assume this was used. Are there any other steps that would be an issue, and is bamtofasq the right way to do this?

I understand these steps algorithmically but don't know how the data in BAM format is actually altered.

Thanks!

next-gen genome • 2.5k views
ADD COMMENT
1
Entering edit mode

Did they remove duplicates or just mark them?

ADD REPLY
0
Entering edit mode

Thanks! I updated my post.

ADD REPLY
1
Entering edit mode
9.6 years ago

I assume you mean samtools bam2fq. It will will write out marked duplicates. It only ignores supplementary and secondary alignments.

ADD COMMENT
0
Entering edit mode
9.6 years ago

Thanks—that answers my question.

For anyone else interested in this question, the Broad Institute has a guide:

http://gatkforums.broadinstitute.org/discussion/2908/howto-revert-a-bam-file-to-fastq-format

ADD COMMENT

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6