Question: Produce A Bed Or Position List File Containing A List Of Regions For Mpileup
0
gravatar for rob234king
5.7 years ago by
rob234king580
UK/Harpenden/Rothamsted Research
rob234king580 wrote:

I mapped reads for SNP calling to a full 10Gb reference. I've reduced this reference to only those contigs that have SNPs of interest mapped to. I would now like to reduce the bam file so I can easily visualise in IGV.

From samtools the method of doing this seems to do samtools view on the bam file with the list of locations. My question is how to generate this file? Could I adapt a samtools faidx of my reduced contig reference file which has output like below to a bed file format?

scaff1 3970 29 79 80 scaff2 8501 4079 79 80

The first column is the name of the scaffold, then length of contig?, then start position? If this is the case could I not just move the third column to column position 2 and add column 2 and 3 for the end position column for a bed format file?

samtools mpileup • 2.6k views
ADD COMMENTlink modified 5.7 years ago by Jorge Amigo11k • written 5.7 years ago by rob234king580

About the fai file Can you please tell me where I find information about .fai file format?

It's not clear what you want to do with this file . You can reduce the calling of mpileup with the option -l file.bed; Can also reduce the size of the BAM with samtools view -L region.bed your.bam

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by Pierre Lindenbaum121k

Yes sorry it was samtools view, realised this in a lecture. how do you create the bed file?

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by rob234king580

Looks like the -t setting can use the index.fai file, so I'll give that a try.

ADD REPLYlink written 5.7 years ago by rob234king580

The -t option will replace the header in the output when you view the file, which is probably not what you want.

ADD REPLYlink written 5.7 years ago by Devon Ryan90k

I try this:

samtools view -L rk1.bed -h A6_genome.sort.rmdup.bam > A6_genome.sort.rmdup_filtered.sam
samtools view -bS A6_genome.sort.rmdup_filtered.sam > A6_genome.sort.rmdup_filtered.bam

but the file is the same size as what it started so don't think it is working.

The only thing I can think of is my bed file is wrong?

(name_of_scaffold) (0) (length)

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by rob234king580

How many scaffolds are in the bed file you're passing to samtools and how many are in the assembly? The bed file should have a much smaller number of them.

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by Devon Ryan90k

approx 600 scaffolds in bed file and 10 million in reference that was used to map to bam file.

ADD REPLYlink written 5.7 years ago by rob234king580

That's odd. Does the same thing happen if your bed file contains just 1 or 2 contigs? BTW, your two commands could be merged to samtools view -L rk1.bed -b -o A6_genome.sort.rmdup_filtered.bam A6_genome.sort.rmdup.bam, which will also probably be a bit faster.

ADD REPLYlink written 5.7 years ago by Devon Ryan90k

I got round it by converting the bam file to sam and using tablet to visualise with my reduced reference which worked fine. I'll keep going trying to figure it out because I can't see why it doesn't work.

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by rob234king580

Thanks, please do report back here in the comments if you find out why that's not producing the expected behaviour.

ADD REPLYlink written 5.7 years ago by Devon Ryan90k

If you just want to be able to visualize the alignments in IGV, then just sort and index the BAM file. IGV won't need to load the whole thing into memory.

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by Devon Ryan90k

Doesn't load, not sure if bam file is too big but only 1.5 gb.

ADD REPLYlink modified 4 months ago by RamRS22k • written 5.7 years ago by rob234king580

Did it give an error message (start IGV from the command line if you don't normally)?

ADD REPLYlink written 5.7 years ago by Devon Ryan90k
0
gravatar for Jorge Amigo
5.7 years ago by
Jorge Amigo11k
Santiago de Compostela, Spain
Jorge Amigo11k wrote:

You can visualize in IGV any .bam file, independently of its size, as long as it is indexed and the reference genome that was used for the alignment is selected in advance. if your .bam file is not indexed (a .bai extension with the same name of the .bam file is what you would have to look for) samtools index yourbamfile would do, and if your genome of interest is not listed in the top left combobox on IGV interface you can enter it manually as .fasta or .genome on IGV's "genome" menu. both .bam/.bai and genome file can be locally or remotely stored served by http. that's all you need to load your .bam file into IGV.

ADD COMMENTlink modified 4 months ago by RamRS22k • written 5.7 years ago by Jorge Amigo11k

I think it's just a memory issue, the computer I'm working from has only 4gb of ram so may explain why IGV stalls.

ADD REPLYlink written 5.7 years ago by rob234king580
1

4GB should be enough for running IGV with the 1.2GB option from the web, and even with the 750MB. it may be the case that you could have hugely covered regions, and so you could set IGV to downsample your visualization in order not to load each and every read. this can be done through [view]>[preferences]>[alignments]>[downsample reads], and has helped us when looking at really enriched target resequencing regions where we have more than 1000x.

ADD REPLYlink written 5.7 years ago by Jorge Amigo11k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1782 users visited in the last hour