Question: Quality control for bam files
0
gravatar for banerjeeshayantan
4 months ago by
banerjeeshayantan70 wrote:

I have a tumor sample where the per base sequence quality is very low for both reads (using FATSQC). So this is what I did:

  1. Converted the tumor.bam to fastq. (Read1.fq and Read2.fq)

    bedtools bamtofq -i tumor.bam -fq Read1.fq -fq2 Read2.fq
    
  2. Now the FASTQC report was very poor. So I decided trimming. Here's what I used:

    sickle pe -f Read1.fq -r Read2.fq -t sanger -o qtrim1.fq -p qtrim2.fq -s single.fq -q 20
    
  3. Now I have three files: qtrim1.fq , qtrim2.fq and single.fq. All of good quality. Now I merge them.

    cat qtrim1.fq qtrim2.fq single,fq > merged.fq
    
  4. Finally convert to bam:

    java -jar picard.jar FastqToSam merged.fq O=trimmed_bam.bam SM=tumor
    

The resulting bam file is very small(~460Mb) as compared to the original(~80GB). I am losing a hell lot of information doing this quality trimming. Any suggestions to get past this?

Kindly go through the steps and suggest anything wrong or something you would have done differently.

Number of reads in original bam: 1034838478
Number of reads in processed bam : 8053958

sequencing next-gen • 366 views
ADD COMMENTlink modified 4 months ago by Ram17k • written 4 months ago by banerjeeshayantan70
1

Was the original BAM file mapped? FastqToSam doesn't do anything useful, you need to remap the results. Do not merge the files like you showed, align the pairs together and the singletons by themselves.

ADD REPLYlink written 4 months ago by Devon Ryan82k

Thanks for replying. I would try your suggestion. Just one question. Converting the bam file to fastq and checking individual read quality through fastqc or directly giving the .bam file as input to fastqc and checking its quality, seems equivalent? The reason for asking this is I am getting very poor quality when I am converting to fastq while the quality seems perfectly fine when I am directly giving the bam input. Here quality means : Per base sequence quality in the fastqc report

ADD REPLYlink written 4 months ago by banerjeeshayantan70

Converted the tumor.bam to fastq.

AFAIK, fastqc reads BAM files

ADD REPLYlink written 4 months ago by Pierre Lindenbaum111k

True. But I wanted to check the individual read qualities. Can I achieve this task by using the bam file in fastqc software?

ADD REPLYlink written 4 months ago by banerjeeshayantan70

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time. Formatting bar

ADD REPLYlink written 4 months ago by Ram17k

Thanks. will definitely do it the next time

ADD REPLYlink written 4 months ago by banerjeeshayantan70

Hello,

why do you try to convert bam to fastq? Didn't you have the original fastq files? The read informations you get may already have been processed in some way before.

Why do you want to do quality trimming? Do you think your mapping/alignment is bad or there are to much false positives?

As Devon said FastqToSam didn't map and align your reads. This must be done again.

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer4.5k

I have just the aligned bam files from EGA. No fastq files were there. The per base sequence quality is very poor in the fastqc report.

ADD REPLYlink written 4 months ago by banerjeeshayantan70
1

Instead of throwing away these information you can first try to make the best out of it.

If the insertsizes are small enough, so you have a lot of overlap between the paired read, you could try the Overlap-based error-correction of bbmerge.

Also GATKs BQSR might have a good impact on your data.

fin swimmer

ADD REPLYlink written 4 months ago by finswimmer4.5k
1
gravatar for Istvan Albert
4 months ago by
Istvan Albert ♦♦ 77k
University Park, USA
Istvan Albert ♦♦ 77k wrote:

Are you really losing a "hell of of lot information" if you are removing bad data? I would say bad data is not information, it is the opposite of information.

Perhaps the your question is:

Is my filtered data really that bad that it warrants removing it?

Answer: Usually yes, but in this case, it might be worth seeing if you are the lucky case when it is not.

Try to process it for the desired information, just the bad data alone. Align it, assemble it etc. See how it goes.

ADD COMMENTlink modified 4 months ago • written 4 months ago by Istvan Albert ♦♦ 77k

Thanks for your reply. Didn't think it this way. Will try to use this downstream and check for myself

ADD REPLYlink written 4 months ago by banerjeeshayantan70
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1462 users visited in the last hour