Question: Why when splitting file in two we add 1 line before dividing the number of lines by 2?
0
gravatar for salamandra
8 days ago by
salamandra160
salamandra160 wrote:

In this tutorial there's a part in which they try to shuffle the lines of a file and then split the file into two. I do not understand why when splitting the file into 2, instead of dividing the number of lines by two they add 1 and then divide by two. Why is that?

That part of the code is:

nlines=$(samtools view merged.bam | wc -l ) # Number of reads in the BAM file
nlines=$(( (nlines + 1) / 2 )) # half that number
samtools view ${tmpDir}/${NAME1}_${NAME2}_merged.bam | shuf - | split -d -l ${nlines} - "${tmpDir}/${EXPT}" # This will shuffle the lines in the file and split it
 into two SAM files
cat ${tmpDir}/${EXPT}_header.sam ${tmpDir}/${EXPT}00 | samtools view -bS - > ${outputDir}/${EXPT}00.bam
cat ${tmpDir}/${EXPT}_header.sam ${tmpDir}/${EXPT}01 | samtools view -bS - > ${outputDir}/${EXPT}01.bam
bash split commandline • 143 views
ADD COMMENTlink modified 8 days ago • written 8 days ago by salamandra160
2

You can try for yourself, run the relevant part with part of the tutorial with nlines=$(( (nlines + 1) / 2 )) and nlines=$(( (nlines ) / 2 )), and see if it makes a difference.

You may also open an issue at the repo with your question. But please first try for yourself and check for differences, if any.

ADD REPLYlink written 8 days ago by h.mon18k

Are there an odd number of lines in the bam?

ADD REPLYlink written 8 days ago by jrj.healey5.9k
2
gravatar for salamandra
8 days ago by
salamandra160
salamandra160 wrote:

Think I know now:

In bash it only gives the integer part of the number. If the original file has for e.g. 3 lines and we want to split in half, we want one file to have 2 lines and the other 1 line, but 3/2=1.5 and bash considers to be =1, so then the file will be split in 3 files each one with one line. If we add 1 before dividing by 2: (3+1)/2=2, so bash will put two lines into a file and the remaining 1 line in the other, as we wanted.

If the file has an even number of lines: for e.g original file has 4 lines, (4+1)/2=2.5 and bash thinks it's 2, so file will be split into files which will contain 2 lines each, so still does what we want for even numbers.

ADD COMMENTlink written 8 days ago by salamandra160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 657 users visited in the last hour