Can'T Extract Information From Bam File
4
0
Entering edit mode
11.4 years ago
alexjironkin ▴ 10

Hi everyone,

I have been struggling with this for a while. I got some published data and it appears to be in the map format from bowtie. I have converted it to SAM file using samtools' bowtie2sam.pl script from the misc directory.

I convert to BAM and everything seem to be fine using:

samtools view -bT arabidopsis.fa in.sam > out.bam

Except I get something along: Char1 treated as '*'

However, when I try to load the data into IGV for example to view along the genome. It loads, but nothing shows up at any zoom level.

I have also tried loading just the SAM file into IGV and that works fine. However, not the BAM file generated from it.

I tried to validate the file with both samtools and other published tools and it returns:

22774604 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates 
22774604 + 0 mapped (100.00%:nan%)
0 + 0 paired in sequencing 
0 + 0 read1 
0 + 0 read2 
0 + 0 properly paired (nan%:nan%) 
0 + 0 with itself and mate mapped 
0 + 0 singletons (nan%:nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

The only thing I really care about is trying to get total read counts out of the data, over a specified region. But again haven't got to that yet. So any ideas on how to extract read count and this BAM issue would be great.

Thanks

Alex

bam • 3.7k views
ADD COMMENT
1
Entering edit mode

check your sequences names/ids, I think you have that wrong.

ADD REPLY
0
Entering edit mode

As everyone else said in this thread, make sure you use the same reference for mapping and for generating BAM.

ADD REPLY
1
Entering edit mode
11.4 years ago
matted 7.8k

It sounds like the reference genome you use in converting the SAM (arabidopsis.fa) doesn't match the one used to generate the alignments. Check it to be sure. The chromosome names in in.sam should be present and exactly match those given in the reference FASTA file that you use later.

ADD COMMENT
0
Entering edit mode

Thanks! The problem was that chromosomes were identified simply by numer in the reference file e.g. 1,2,3,4,5,M and C but in the mapping file had Chr1, Chr2, Chr3, Chr4, Chr5,ChrM and ChrC. Once I changed the genome reference to that it worked :) Thanks A lot

ADD REPLY
0
Entering edit mode
11.4 years ago

Are you sure you need to run samtools view with the -T option? But yeah, doublecheck that the chromosome names in your reference file match the chromosome names in your .bam. And doublechecking that the reference index was made properly, by remaking it with samtools faidx, and seeing if that throws an error, wouldn't hurt either.

ADD COMMENT
0
Entering edit mode

I read that if the SAM file doesn't have headers then use '-bT' option with the reference genome when converting to BAM. No '@' lines are present at the beginning of the SAM file.

ADD REPLY
0
Entering edit mode
11.4 years ago
AsoInfo ▴ 300

To extract the read count information, you can refer to the following article:

http://vallandingham.me/RNA_seq_differential_expression.html

ADD COMMENT
0
Entering edit mode
11.4 years ago
Marvin ▴ 890

Sounds as if your SAM file doesn't have the required @SQ headers, which you attempted to fix using the T option. What you actually need is the small t option:

samtools view -bt arabidopsis.fa.fai in.sam > out.bam

The .fai file is made by running

samtools faidx arabidopsis.fa
ADD COMMENT
0
Entering edit mode

My reference had pure numbers and SAM files had Chr prefix. The *.fai file also doesn't have reference to Mitochondria and Cytoplasmic? mappings do all the read from them go unmapped. I have updated my local copy of reference to have Chr prefixes and it works. Thanks though.

ADD REPLY
0
Entering edit mode

Oh my bad mitochondrial reference is not in the genome references. I guess I would still have to change to have Chr prefixes.

ADD REPLY

Login before adding your answer.

Traffic: 3000 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6