I have a tumor sample where the per base sequence quality is very low for both reads (using FATSQC). So this is what I did:
Converted the tumor.bam to fastq. (Read1.fq and Read2.fq)
bedtools bamtofq -i tumor.bam -fq Read1.fq -fq2 Read2.fq
Now the FASTQC report was very poor. So I decided trimming. Here's what I used:
sickle pe -f Read1.fq -r Read2.fq -t sanger -o qtrim1.fq -p qtrim2.fq -s single.fq -q 20
Now I have three files: qtrim1.fq , qtrim2.fq and single.fq. All of good quality. Now I merge them.
cat qtrim1.fq qtrim2.fq single,fq > merged.fq
Finally convert to bam:
java -jar picard.jar FastqToSam merged.fq O=trimmed_bam.bam SM=tumor
The resulting bam file is very small(~460Mb) as compared to the original(~80GB). I am losing a hell lot of information doing this quality trimming. Any suggestions to get past this?
Kindly go through the steps and suggest anything wrong or something you would have done differently.
Number of reads in original bam: 1034838478
Number of reads in processed bam : 8053958
Was the original BAM file mapped?
FastqToSam
doesn't do anything useful, you need to remap the results. Do not merge the files like you showed, align the pairs together and the singletons by themselves.Thanks for replying. I would try your suggestion. Just one question. Converting the bam file to fastq and checking individual read quality through fastqc or directly giving the .bam file as input to fastqc and checking its quality, seems equivalent? The reason for asking this is I am getting very poor quality when I am converting to fastq while the quality seems perfectly fine when I am directly giving the bam input. Here quality means : Per base sequence quality in the fastqc report
AFAIK, fastqc reads BAM files
True. But I wanted to check the individual read qualities. Can I achieve this task by using the bam file in fastqc software?
Please use the formatting bar (especially the
code
option) to present your post better. I've done it for you this time.Thanks. will definitely do it the next time
Hello,
why do you try to convert bam to fastq? Didn't you have the original fastq files? The read informations you get may already have been processed in some way before.
Why do you want to do quality trimming? Do you think your mapping/alignment is bad or there are to much false positives?
As Devon said FastqToSam didn't map and align your reads. This must be done again.
fin swimmer
I have just the aligned bam files from EGA. No fastq files were there. The per base sequence quality is very poor in the fastqc report.
Instead of throwing away these information you can first try to make the best out of it.
If the insertsizes are small enough, so you have a lot of overlap between the paired read, you could try the Overlap-based error-correction of bbmerge.
Also GATKs BQSR might have a good impact on your data.
fin swimmer