Quality control for bam files
1
1
Entering edit mode
6.1 years ago
Gene_MMP8 ▴ 240

I have a tumor sample where the per base sequence quality is very low for both reads (using FATSQC). So this is what I did:

  1. Converted the tumor.bam to fastq. (Read1.fq and Read2.fq)

    bedtools bamtofq -i tumor.bam -fq Read1.fq -fq2 Read2.fq
    
  2. Now the FASTQC report was very poor. So I decided trimming. Here's what I used:

    sickle pe -f Read1.fq -r Read2.fq -t sanger -o qtrim1.fq -p qtrim2.fq -s single.fq -q 20
    
  3. Now I have three files: qtrim1.fq , qtrim2.fq and single.fq. All of good quality. Now I merge them.

    cat qtrim1.fq qtrim2.fq single,fq > merged.fq
    
  4. Finally convert to bam:

    java -jar picard.jar FastqToSam merged.fq O=trimmed_bam.bam SM=tumor
    

The resulting bam file is very small(~460Mb) as compared to the original(~80GB). I am losing a hell lot of information doing this quality trimming. Any suggestions to get past this?

Kindly go through the steps and suggest anything wrong or something you would have done differently.

Number of reads in original bam: 1034838478
Number of reads in processed bam : 8053958

next-gen sequencing • 4.8k views
ADD COMMENT
1
Entering edit mode

Was the original BAM file mapped? FastqToSam doesn't do anything useful, you need to remap the results. Do not merge the files like you showed, align the pairs together and the singletons by themselves.

ADD REPLY
0
Entering edit mode

Thanks for replying. I would try your suggestion. Just one question. Converting the bam file to fastq and checking individual read quality through fastqc or directly giving the .bam file as input to fastqc and checking its quality, seems equivalent? The reason for asking this is I am getting very poor quality when I am converting to fastq while the quality seems perfectly fine when I am directly giving the bam input. Here quality means : Per base sequence quality in the fastqc report

ADD REPLY
0
Entering edit mode

Converted the tumor.bam to fastq.

AFAIK, fastqc reads BAM files

ADD REPLY
0
Entering edit mode

True. But I wanted to check the individual read qualities. Can I achieve this task by using the bam file in fastqc software?

ADD REPLY
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLY
0
Entering edit mode

Thanks. will definitely do it the next time

ADD REPLY
0
Entering edit mode

Hello,

why do you try to convert bam to fastq? Didn't you have the original fastq files? The read informations you get may already have been processed in some way before.

Why do you want to do quality trimming? Do you think your mapping/alignment is bad or there are to much false positives?

As Devon said FastqToSam didn't map and align your reads. This must be done again.

fin swimmer

ADD REPLY
0
Entering edit mode

I have just the aligned bam files from EGA. No fastq files were there. The per base sequence quality is very poor in the fastqc report.

ADD REPLY
1
Entering edit mode

Instead of throwing away these information you can first try to make the best out of it.

If the insertsizes are small enough, so you have a lot of overlap between the paired read, you could try the Overlap-based error-correction of bbmerge.

Also GATKs BQSR might have a good impact on your data.

fin swimmer

ADD REPLY
1
Entering edit mode
6.1 years ago

Are you really losing a "hell of of lot information" if you are removing bad data? I would say bad data is not information, it is the opposite of information.

Perhaps the your question is:

Is my filtered data really that bad that it warrants removing it?

Answer: Usually yes, but in this case, it might be worth seeing if you are the lucky case when it is not.

Try to process it for the desired information, just the bad data alone. Align it, assemble it etc. See how it goes.

ADD COMMENT
0
Entering edit mode

Thanks for your reply. Didn't think it this way. Will try to use this downstream and check for myself

ADD REPLY

Login before adding your answer.

Traffic: 2340 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6