Question: ADTEx coverage issue
0
gravatar for 2nelly
18 months ago by
2nelly120
Athens
2nelly120 wrote:

Hi all,

I am trying to use ADTEx on my exome sequencing data following the user guide instructions. The problem is that despite the fact I am not getting any error during running the command, the calculated coverage files for sample and normal bam files are endless;what I mean is that ADTEx keeps writing coverage file like forever and the coverage files are a lot of GBs (>700) and eventually my server crashes due to lack of free space. I also tried to obtain the depth of coverage using bedtools but I am still facing the same problem. What I suspect is that ADTEx and bedtools don t use my target bed file to calculate the specific regions coverage. Any ideas what s happening? My target.bed seems normal and I successfully used it in other software (chromosome names are same both in bam and target). Please see the ADTEx and bedtools commands below.

python ADTEx.py --normal ~/Desktop/processed_reads/mouse_3/m3_Liver_processed_realigned.bam --tumor ~/Desktop/processed_reads/mouse_3/m3_T1Bm1_processed_realigned.bam --bed ~/Desktop/mm10_bed_files/mm10_targets.bed --ploidy 2 --plot --minReadDepth 50 --out ~/Desktop/m3T1Bm1


coverageBed -abam ~/Desktop/processed_reads/mouse__T1Bm1_processed_realigned.bam -b ~/Desktop/mm10_bed_files/mm10_targets.bed -d > ~/Desktop/coverage.txt

Thank you in advance!

ADD COMMENTlink modified 18 months ago by Amitm1.3k • written 18 months ago by 2nelly120
1

A general suggestion. Perhaps you have done this already but when this sort of thing happens it is always better to check the command with a reduced dataset/one sample.

ADD REPLYlink modified 18 months ago • written 18 months ago by genomax37k

You mean like trying to run it for one or two chromosomes?

ADD REPLYlink written 18 months ago by 2nelly120

What ever is easy to do. Are you working with just 2 files and the output becomes hundreds of GB or are there multiple samples? You could trim the target bed down to a smaller chromosome/region (chr22) and then do a run to check the output to see if it matches you expectation.

ADD REPLYlink written 18 months ago by genomax37k

So, I have 2 bam files, one normal and one tumor and a target bed file containing all the exons coordinates. The calculated coverage file for each of 2 bam files becomes hundreds of GBs. Actually I just did that. I trimmed the bed file keeping only exons coordinates for a gene, but I am still facing the same issue. So, it seems that the target bed file is not taken into account by ADTEx and bedtools. I suppose this should be a command's line argument issue that s why i posted above. Any ideas are welcome.

ADD REPLYlink written 18 months ago by 2nelly120

For coverageBed see this important note. What version of bedtools are you using?

As of version 2.24.0, the coverage tool has changed such that the coverage is computed for the A file, not the B file. This changes the command line interface to be consistent with the other tools. Also, the coverage tool can accept multiple files for the -b option. This allows one to measure coverage between a single query (-a) file and multiple database files (-b) at once!

ADD REPLYlink written 18 months ago by genomax37k

Yep, I have 2.25.0, so I think my command is correct.right?

ADD REPLYlink written 18 months ago by 2nelly120

Your target areas are in bed file so that should be first option (-a) for coverageBed.

ADD REPLYlink written 18 months ago by genomax37k

Unfortunately not. I am afraid you are wrong. I tried your suggestion and I got a blank file.

ADD REPLYlink written 18 months ago by 2nelly120

Based on the discussion below it looks certain that there is a mismatch between your bed file and BAM's. Are the chromosome names matching in both?

Can you show us the output of

$ samtools view -H your_bam
ADD REPLYlink written 18 months ago by genomax37k

Yep this is the first thing I checked

@SQ SN:mchr1    LN:195471971
@SQ SN:mchr2    LN:182113224
@SQ SN:mchr3    LN:160039680
@SQ SN:mchr4    LN:156508116
@SQ SN:mchr5    LN:151834684
@SQ SN:mchr6    LN:149736546
@SQ SN:mchr7    LN:145441459
@SQ SN:mchr8    LN:129401213
@SQ SN:mchr9    LN:124595110
@SQ SN:mchr10   LN:130694993
@SQ SN:mchr11   LN:122082543
@SQ SN:mchr12   LN:120129022
@SQ SN:mchr13   LN:120421639
@SQ SN:mchr14   LN:124902244
@SQ SN:mchr15   LN:104043685
@SQ SN:mchr16   LN:98207768
@SQ SN:mchr17   LN:94987271
@SQ SN:mchr18   LN:90702639
@SQ SN:mchr19   LN:61431566
@SQ SN:mchrX    LN:171031299
@SQ SN:mchrY    LN:91744698
@SQ SN:mchrM    LN:16299
@PG ID:bwa  PN:bwa  VN:0.7.12-r1039 CL:/apps/bwa-0.7.12/bwa mem -t30 /apps/gendatabases/mouse_712/allmchr_masked.fasta m3_T1Bm1_R1.fastq m3_T1Bm1_R2.fastq
ADD REPLYlink modified 18 months ago • written 18 months ago by 2nelly120

The names are matching your bed file. Is your bed file tab delimited? Have you tried keeping just the first three columns of your bed file to see if that helps?

ADD REPLYlink written 18 months ago by genomax37k

Yep, it s tab delimited and I also tried without annotation! I know it s weird!!!

ADD REPLYlink written 18 months ago by 2nelly120
0
gravatar for Amitm
18 months ago by
Amitm1.3k
UK
Amitm1.3k wrote:

hi 2nelly, In case this info helps, I use the same command setup for bedtools to generate the coverage file (using target region bed file). On an average I get a coverage file of size 1- 1.5Gb for WES with mean coverage of 60-100x. How much is the mean cov. of your BAM?

I don't use ADTEX directly on the BAMs, but on the coverage file produced by bedtools coverageBed (v. 2.20.1).

This is the command I have been using -

python \
/home/amandal/softwares/ADTEx.v.2.0/ADTEx.py \
--normal Normal_samp_EachPos-level_Cov_ROI.txt \
--tumor Tumor_samp_EachPos-level_Cov_ROI.txt \
--bed targets.bed \
--out ADTEX_result_$dateToday \
--DOC \
--minReadDepth 35 \
--plot

Once, I have the bedtools cov. file ready, the above ADTEX cmd completes within half hr. The --minReadDepth value is something which I have arrived at for moderate depth WES. The time taken isn't affected if default is used.

ADD COMMENTlink written 18 months ago by Amitm1.3k

Thanks for your answer Amitm. Since I am facing this issue by using directly the BAM files on ADTEx, I tried also to produce the coverage files using bedtools in order to run the ADTEx with DOC files as input, but I still have to deal with the same issue. The coverage files are endless. The coverage of my BAM files is 100-140x. What I have notice in ADTEx manual, is that you have to use -d argument in bedtools which reads every single base in every read and reports it. Did u also use this argument to produce coverage files?

ADD REPLYlink written 18 months ago by 2nelly120

hi, Yes, I used the -d param in coverageBed. Notice that hence I also name the input files (from bedtools) as _EachPos-level_Cov_.

ADD REPLYlink written 18 months ago by Amitm1.3k

yep, so I am 100% sure that something is wrong with the bed file, because bedtools doesn t look for coverage in bed file coordinates. See below my bed file. Note that mchr refers to (mouse chr) and the nomenclature is exactly the same in bam files.

mchr1   4495446 4495694 Sox17
mchr1   4495694 4495942 Sox17
mchr1   4496290 4496556 Sox17
mchr1   4496556 4496822 Sox17
mchr1   4496822 4497088 Sox17
mchr1   4497088 4497354 Sox17
mchr1   4773199 4773462 Mrpl15
mchr1   4773462 4773725 Mrpl15
mchr1   4773725 4773989 Mrpl15
mchr1   4773989 4774252 Mrpl15
mchr1   4774252 4774516 Mrpl15
mchr1   4774516 4774769 Mrpl15
mchr1   4774769 4775023 Mrpl15
mchr1   4775023 4775277 Mrpl15
mchr1   4775277 4775531 Mrpl15
mchr1   4775531 4775785 Mrpl15
mchr1   4775785 4776039 Mrpl15
mchr1   4776039 4776293 Mrpl15
mchr1   4776293 4776547 Mrpl15
mchr1   4776547 4776801 Mrpl15
mchr1   4777524 4777648 Mrpl15
mchr1   4782567 4782733 Mrpl15
mchr1   4783950 4784105 Mrpl15
mchr1   4785572 4785726 Mrpl15
mchr1   4807892 4807982 Lypla1
mchr1   4808454 4808486 Lypla1
mchr1   4828583 4828649 Lypla1
mchr1   4830267 4830315 Lypla1
ADD REPLYlink modified 18 months ago • written 18 months ago by 2nelly120

hi, I too suspect the same as the tools itself work fine. Some checks which might help - 1) The target file I use is sorted by coordinates. Sorted bed would probably make it run faster. Also, my target file is from Agilent SureSelect and hence contains only those regions where probes are designed for. Is your bed too limited to captured regions? 2) The other thing I note is that the coords in your bed are often tailing the previous; contiguous I mean (like the 1st two). I am not sure if this could cause coverageBed to behave unexpectedly.

Maybe generate a fresh bed (for exonic regions) from BioMart or UCSC for mm10 and try. In my hands, coverageBed takes about an hour to generate the coverage file.

ADD REPLYlink written 18 months ago by Amitm1.3k

This is exactly what I did Amitm. These coordinates are coming from UCSC table browser for mm10. Regarding the second issue you mentioned, i don t know why UCSC split that way the exon. But in any case I can agree with you that this issue is practically impossible to cause this behaviour. I am wondering, if there was a mistake in bed file I would probably receive an error while running bedtools. right? I am confused.Any ideas are welcome!

ADD REPLYlink written 18 months ago by 2nelly120

I can't remember now the use case, but I noted that bedtools wasn't helpful with an error message in case of wrong syntax/ bed file. If you want to debug you could just take one coord. entry from your bed file and pass it to coverageBed and see if you get what result. To avoid spending time on debugging, - 1) you could use any other tool for CNA calling (not dependent on bedtools input). 2) Or, you could download the bed from Agilent site (requires free registration) which I know works with bedtools. This is not an efficient solution either as the coords you get are from mm9. I have used LiftOver to get the mm10 coords. Sorry, can't seem to put finger on a proper solution.

ADD REPLYlink written 18 months ago by Amitm1.3k

It s ok. In any case thank you for your help

ADD REPLYlink written 18 months ago by 2nelly120

Hi 2nelly ,

I am facing exactly the same problem as mentioned in this blog.

I am trying to use ADTEx and when the direct use of bam files did not wok, I resorted to trying to create the per-base coverage files. I have debugged every step you have discussed here, but I am usable to use the "target" file..

The output from coverage bed trying to create coverage for all possible positions (even if I use a very small target file for a specific chromosome) and hence this will be too big (> 700 GB) to handle.

Did you end up getting a solution to this problem?

Thanks in advance,

regards,

NPD

ADD REPLYlink written 5 weeks ago by deshpandenandan19750
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1360 users visited in the last hour