Question: Issue with naming convention in bedtools
3
gravatar for A. Domingues
3.6 years ago by
A. Domingues1.6k
Mainz, Germany
A. Domingues1.6k wrote:

While intersecting a bam file and bed to obtain the total number of reads that map to a class of genes, bedtools outputs the following error:

bedtools intersect -a some.bam -b LINE.bed -sorted -s -wa -bed -f 1.0 -nonamecheck | wc -l

ERROR: File some.bam has inconsistent naming convention for record:
      -1      -1      HWI-ST558:319:HJHNYADXX:1:1101:10003:66276 1:N:0:GATCAG 0       +
137933

I poked around the internet and this appears to occour when chromosome naming conventions do not match. However, the stated solution, option `-nonamecheck` did not work. The culprit is (at least) this alignment:

samtools view some.bam | grep 'HWI-ST558:319:HJHNYADXX:2:2104:11288:74456'
HWI-ST558:319:HJHNYADXX:2:2104:11288:74456      0       chr4    61813677        0       27M     *       0       0       GACAAATCAAGAAAACCATTAAACAGC     FFFFHHDHFHJJJGIGFFFHGIG@GDH      XA:i:0  MD:Z:27 RG:Z:1  NM:i:0  XM:i:2

Which to me does not appear to be any different from other alignments in the file:

samtools view some.bam | head
HWI-ST558:319:HJHNYADXX:1:1110:11857:4127       0       chr1    5571    255     29M     *       0       0       GCCACATGTGTGGACCAGGTCAACACATA   FFFFHHHHHJJJJJJJJJJIIJJJJJIJJ    XA:i:2  MD:Z:5G12A10    RG:Z:1  NM:i:2
HWI-ST558:319:HJHNYADXX:1:1210:19976:51309      16      chr1    8405    255     27M     *       0       0       GACTCAGTTAGTTTGATCAGTGTGTTT     HCHHGIJJJIJIIIGIIJHGHHHFFDF      XA:i:0  MD:Z:27 RG:Z:1  NM:i:0
HWI-ST558:319:HJHNYADXX:1:2101:16679:94808      0       chr1    13822   255     31M     *       0       0       TTAATTTGCCCTCATAAACTCTGATCAAAGT FFFFHHHHHJJJJJJJJIHJIJJJJJJJJJJ  XA:i:0  MD:Z:31 RG:Z:1  NM:i:0
HWI-ST558:319:HJHNYADXX:1:2209:11460:28507      0       chr1    19815   0       24M     *       0       0       TGACGTTGTGTGGACATTACCACT        DDFFHHHHHJJJJJJJJJJJJJJJXA:i:0   MD:Z:24 RG:Z:1  NM:i:0  XM:i:2
HWI-ST558:319:HJHNYADXX:2:2206:8659:12479       0       chr1    22434   0       27M     *       0       0       TAACCGGCACGGGCTCATTCTGAAAGC     FEFFHDAFHJJDEGBHIIJ>GGCHIJJ      XA:i:3  MD:Z:5A0A3A16   RG:Z:1  NM:i:3  XM:i:2
HWI-ST558:319:HJHNYADXX:2:1116:12702:23989      0       chr1    24671   0       30M     *       0       0       ATAGATATTATCAGACACACTGTGAACATT  FFFFHFHHHJIJIJJJJJJIJJJJIJJFHI   XA:i:1  MD:Z:3A26       RG:Z:1  NM:i:1  XM:i:2
HWI-ST558:319:HJHNYADXX:2:2112:15091:10459      0       chr1    37645   0       29M     *       0       0       CTGTGCAGAGCTGCAGCCCTCAAGGAGTT   FFADDFFDHEGBDD@GIEIGHEGHC8:FG    XA:i:1  MD:Z:21C7       RG:Z:1  NM:i:1  XM:i:2
HWI-ST558:319:HJHNYADXX:1:2101:17286:97920      0       chr1    43096   0       26M     *       0       0       CGGTTTGACTGTACAAGTGAAATACC      FFFDHHHHHJJIJJJJIHIHEIIIJJ       XA:i:0  MD:Z:26 RG:Z:1  NM:i:0  XM:i:2
HWI-ST558:319:HJHNYADXX:2:1108:5430:68222       0       chr1    43273   0       24M     *       0       0       TCGTTGAGAAGAGTAGGGACGGTA        FFFFHHHHHJJJJGHIJJJJIIGIXA:i:0   MD:Z:24 RG:Z:1  NM:i:0  XM:i:2
HWI-ST558:319:HJHNYADXX:2:2207:2908:11814       16      chr1    43750   255     29M     *       0       0       TACGCTTCAGTACGATCACTGATGTTGAT   JIJIJJIJJJJJJJJJJIJJHHHHHFFFF    XA:i:1  MD:Z:9T19       RG:Z:1  NM:i:1

 

This happens for all 12 of my bam files (mapped to Danrer7, bowtie-0.12.8*). I am a bit at a lost of what is happening here. Bedtools Version: v2.23.0

sam bedtools • 5.0k views
ADD COMMENTlink modified 3.5 years ago • written 3.6 years ago by A. Domingues1.6k
2
gravatar for A. Domingues
3.5 years ago by
A. Domingues1.6k
Mainz, Germany
A. Domingues1.6k wrote:

I have spent some more time finding the source of the issue, and it turns to be something simple:
Mapping was done against Zv9 and the bed file contained coordinates from Zv10 - UCSC now has this version as default and I obviously did not pay enough attention (I hang my head in shame). When replacing with the Zv9 bed file the issue is gone.

I would expect that in such a case, the error message would something like '-a and -b coordinates not matching' but the fact that it gives an error is already re-assuring.

 

ADD COMMENTlink written 3.5 years ago by A. Domingues1.6k
5
gravatar for Pierre Lindenbaum
3.6 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum114k wrote:

I think it's more because the read name contains a space:

 

 HWI-ST558:319:HJHNYADXX:1:1101:10003:66276 1:N:0:GATCAG

 

this name should have been trimmed to:

HWI-ST558:319:HJHNYADXX:1:1101:10003:66276

 

which aligner did you use ?

ADD COMMENTlink written 3.6 years ago by Pierre Lindenbaum114k

which aligner did you use ?

bowtie-0.12.8. It is a very old version, but I am using it to be consistent with another set of results that someone else produced.

 

I think it's more because the read name contains a space:

But this seems to be produced by bedtools intersect, rather than by bowtie, because the space is not present in the read name from the bam. Also, bedToBam returns a proper read name:

bamToBed -i some.bam | grep 'HWI-ST558:319:HJHNYADXX:2:2104:11288:74456'
chr4    61813676        61813703        HWI-ST558:319:HJHNYADXX:2:2104:11288:74456      0       +

Actually, I just tested converting bam to bed before intersecting and it does not throw out any error:

bamToBed -i some.bam | bedtools intersect -a - -b LINE.bed -sorted -s -wa -bed -f 1.0 | wc -l

If I am reading this test correctly it might be a bug in intersectBed. I will wait to see if Aaron Quinlan shows up here, otherwise I report it soonish.

ADD REPLYlink written 3.6 years ago by A. Domingues1.6k

But this seems to be produced by bedtools intersect : no because ''1:N:0:GATCAG" is typically produced by a sequencer see: http://en.wikipedia.org/wiki/FASTQ_format

 

@EAS139:136:FC706VJ:2:2104:15343:197393 1:Y:18:ATCACG
EAS139 the unique instrument name
136 the run id
FC706VJ the flowcell id
2 flowcell lane
2104 tile number within the flowcell lane
15343 'x'-coordinate of the cluster within the tile
197393 'y'-coordinate of the cluster within the tile
1 the member of a pair, 1 or 2 (paired-end or mate-pair reads only)
Y Y if the read is filtered, N otherwise
18 0 when none of the control bits are on, otherwise it is an even number
ATCACG index sequence
ADD REPLYlink written 3.6 years ago by Pierre Lindenbaum114k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1736 users visited in the last hour