Question

BBmerge output question

0

Entering edit mode

3.7 years ago

ocortes • 0

Hi all,

I am using for the first time bbmerge and I do not know exactly how to understand my results. I would be very greatfully if someone can help me. I have paired end reads and I am running two commands to merge the reads in a single fastq file, first repair.sh and then bbmerge.sh. This is an example of one of the results:

This is a good merging result or If I understand the output well, I am losing 67% of the reads? Can I do something to improve the merging results?

Thanks in advance

BBmerge sequencing • 2.4k views

ADD COMMENT • link 3.7 years ago by ocortes • 0

1

Entering edit mode

repair.sh is only needed if your reads were out of sync in R1/R2 files. With merging it is best to start with original files (even before doing any trimming). If you did not do that then I suggest that you run bbmerge.sh on your original data files.

Merging will only work if your reads were designed to do that i.e. length of sequencing was longer than the size of the insert.

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

Thanks for your quick answer. I run first bbmerge.sh but I got this message:

Writing mergable reads merged
STarted output threads
java.lang.AssertionError:
There appear to be different numbers of reads in the paired input files.
The pairing may have been corrupted by an upstream process. It may be fixable by running repair.sh
at stream.ConcurrentGenericReadInputStream.pair (ConcurrentGenericReadInputStream.java:497)
at stream.ConcurrentGenericReadInputStream.readlist (ConcurrentGenericReadInputStream.java:362)
at stream.ConcurrentGenericReadInputStream.run0(ConcurrentGenericReadInputStream.java:206)
at stream.ConcurrentGenericReadInputStream.run((ConcurrentGenericReadInputStream.java:182)

So I run first repair.sh and with the ouput files I run bbmerge.sh, this makes sense?

Thanks!

ADD REPLY • link updated 3.7 years ago by GenoMax 141k • written 3.7 years ago by ocortes • 0

1

Entering edit mode

This is the reason why I suggested that you should run bbmerge.sh on original data files.

If your files were out-of-sync then repair.sh will indeed fix that part up.

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

Sorry but I am not sure that yuo can see the image inserted, this are the results:

Pairs:                    3534265          
Joined:                 1149304           32.519%  
Ambiguous:           2383651          67.444%  
No solution:          1310                 0.037%           
Too short:             0                       0.000%


AVG Insert                    146.7   
Standard Deviation:      24.5  
Mode:                            164

Insert Range:  36-579  
90th percentile: 176  
75th percentile: 165  
50th percentile: 148  
25th percentile: 129  
10th percentile: 114

ADD REPLY • link updated 3.7 years ago by GenoMax 141k • written 3.7 years ago by ocortes • 0

1

Entering edit mode

Based on these results you can see that there are ambiguous overlaps where a large % of reads that BBMerge is not able to confidently resolve. You can see that there seems to be a wide range of insert sizes in your data.

What kind of an experiment is this (16S amplicons)?

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

There are bovine genomic sequences. I have a .bam and .bai file for each chromosome and each file has sequences of 42 animals First I run: picard.jar SamToFastq INPUT=chr6.bam OUTPUT_PER_RG=true RG_TAG=ID OUTPUT_DIR=dir_location VALIDATION_STRINGENCY=LENIENT

and then with the fastq files of each aniaml (animal1_1.fastq; animal1_2_fastq, and so on until 42 aniamls) I run repair.sh and then bbmerge.sh

Thanks!

P.D. Sorry for my questions....perhaps are so dumbs...but I am begining with genome sequence analysis

ADD REPLY • link 3.7 years ago by ocortes • 0

1

Entering edit mode

Not sure why you are trying to merge the reads if these are plain genomic sequences. Merging reads only makes sense when there is a specific reason to do that. For example in 16S amplicon sequencing.

What it is that you want to do with this data? Sounds like data is already aligned since you have SAM files?

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

SNP calling among the 42 genomic sequences using HaplotypeCaller As I have paired end reads (forward and reverse) I assume that the first step it is to merge both reads in a single one but just this afternoon I have read that it is not neccesary, sorry....

But when I do reference mapping the sequences using bwa mem the sam files that I get are empty:
bwa mem -M CHR6.fa file_1.fastq file_2.fastq > file_2R.sam

SO know I am trying:
bwa aln -I -m 100000 -o 1 -n 0.01 -l 200 -e 12 -d 12 -t 2 CHR6.fa file_1.fastq > file_1.sai
bwa aln -I -m 100000 -o 1 -n 0.01 -l 200 -e 12 -d 12 -t 2 CHR6.fa file_2.fastq > file}_2.sai
bwa sampe CHR6.fa file_1.sai file_2.sai file_1.fastq file_2.fastq > prueba${i}_2R.sam

Thanks for any suggestion!!

ADD REPLY • link 3.7 years ago by ocortes • 0

1

Entering edit mode

If you already have aligned data files why not use them directly for SNP calling?

ADD REPLY • link 3.7 years ago by GenoMax 141k

0

Entering edit mode

Thaks for your comments you are helping me a lot. I am new in this kind of analysis and probably. So it is no neccesary to merge both reads and I as I have .bam files I could directly run SNP calling pipielines.

I am runing this pipieline:
picard.jar SortSam
picard.jar MarkDuplicates
picard.jar AddOrReplaceReadGroups
picard.jar BuildBamIndex
GenomeAnalysisTK.jar RealignerTargetCreator
GenomeAnalysisTK.jar IndelRealigner
samtools calmd picard.jar BuildBamIndex
GenomeAnalysisTK.jar HaplotypeCaller

Then I run GenomeAnalysisTK.jar SelectVariants and VariantFiltration for SNPs and INDELS and finally VariantRecalibrator and ApplyRecalibration

Regards!!!!!

ADD REPLY • link 3.7 years ago by ocortes • 0

0

Entering edit mode

Are you following a tutorial? A ot of these steps are unnecessary or outdated.

ADD REPLY • link 3.7 years ago by WouterDeCoster 47k