Duplicate chromosome location columns in a bed file makes problem in importing them into R
1
0
Entering edit mode
21 months ago
minoo ▴ 10

I have converted a paired of fastq files to a bed file using the code below:

bowtie2 --end-to-end --very-sensitive --no-mixed --no-discordant --phred33 -I 10 -X 700 -p ${cores} -x ${ref} -1 ${fastQR1} -2 ${fastQR2} -S ${samFile} &> ${txtFile}

samtools view -bS -F 0x04 $proj/a.sam >$proj/a.bam

bedtools bamtobed -i $proj/a.bam -bedpe >$proj/a.bed

And now the head of my bed file look like below:

chr1    242251375   242251525   chr1    242251390   242251540   NS500442:247:HKWKNAFX2:1:11101:3060:1048    17  +   -
chr9    41169424    41169574    chr9    41169494    41169644    NS500442:247:HKWKNAFX2:1:11101:14485:1058   1   +   -

It has 10 columns, but I ahve no idea why there are duplicate chromosome location here and how can I iport this to R as GRange object. Any idea?

granges bedtools r samtools bowtie2 • 445 views
ADD COMMENT
2
Entering edit mode
21 months ago
cmdcolin ★ 3.8k

your bed file is actually a BEDPE file because you specified to use the -bedpe flag to bedtools bamtobed. it has two chr, start, and end columns because it stores each "pair" of reads on a single line

here is some more info

https://bedtools.readthedocs.io/en/latest/content/general-usage.html#bedpe-format

https://thesequencingcenter.com/knowledge-base/what-are-paired-end-reads/

ADD COMMENT

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6