Question: Have anybody converted exome bam files to .tdf files for visualization in IGV?
1
gravatar for ivivek_ngs
5.0 years ago by
ivivek_ngs4.8k
Seattle,WA, USA
ivivek_ngs4.8k wrote:

Dear All,

 

I am trying to view my exome bam files in IGV, it seems the bam files are too huge (20 GB) each so this is a big constraint to get them viewed in IGV. I got to know that for visualization of large datasets it is useful to convert them to count format or preferably to .tdf format. I have seen the documents and it says this is usually used in case of RNA-Seq and CHIP-Seq. Now the want to use it for exome-seq as well. My data is paired end and with 100bp reads. The coverage is 70X for the samples. How to change the -e parameter for generating the .tdf file for my samples

igvtools count -z 5 -w 25 -e 250 input.bam out.bam.tdf  hg19

 

Can anyone give me suggestions?

sequencing snp alignment next-gen • 4.4k views
ADD COMMENTlink modified 24 months ago by predeus1.1k • written 5.0 years ago by ivivek_ngs4.8k

Have you tried just sorting and indexing? There's usually no need to make a tdf file.

ADD REPLYlink written 5.0 years ago by Devon Ryan90k
1
gravatar for predeus
24 months ago by
predeus1.1k
Russia
predeus1.1k wrote:

Basically -e adds extra coverage to your reads, which is annoying and can be misleading. If the sequencing is paired-end, you'll see both reads, so there's no real reason to extend the coverage past what's actually seen in the reads.

I would say just set it to 0 for all applications.

Go ahead and make three TDF files, with -e of 0, 100, and 200, you'll see what I'm talking about.

What's more important/tricky is marking duplicates before you make the TDF. You should mark them (with Picard MarkDuplicates, not samtools) for WES, WGS, and ChIP-seq, and should NOT mark them for RNA-seq and amplicon sequencing.

ADD COMMENTlink written 24 months ago by predeus1.1k
0
gravatar for Martombo
5.0 years ago by
Martombo2.4k
Seville, ES
Martombo2.4k wrote:

in the manual http://www.broadinstitute.org/igv/igvtools_commandline it states that the -e option should be set to the average fragment length of the library minus the average read length.

ADD COMMENTlink modified 5.0 years ago • written 5.0 years ago by Martombo2.4k

Yes I already did that, but usually it is used for RNA-Seq and ChIP-Seq studies and I did not find anything on exome seq studies. I am currently using the same command listed as I know my fragment length is 100 bp and assuming my library avg. fragment length around 350 , since for illumina hi seq usually the library is between 250-500 bp. lets see what the results come up. Thanks.

ADD REPLYlink written 5.0 years ago by ivivek_ngs4.8k

you can probably tune this parameter to have a different resolution for the counts. if you want to have a higher resolution you can decrease the value, though increasing the file size.

ADD REPLYlink written 5.0 years ago by Martombo2.4k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2168 users visited in the last hour