Problem when combine two sam file toghther
0
0
Entering edit mode
7.5 years ago

Dear all,

I am trying to combine two sam file together and subject to bedtools to convert it into bedpe file.

The two sam file are generated by mapping a pair-end read separately to the genome using bowtie2(-U -local).

Here are the code that I use:

# For read 1
samtools view -@ 16 -bT hg19.fa  R1.sam > R1.bam
samtools sort -@ 16 -n R1.bam R1_sorted

# Do the same thing with read 2
...

# Merge two bam file
samtools merge -@ 16 -n R1R2_sorted_merged.bam R1_sorted.bam R2_sorted.bam
samtools sort -@ 16 -n R1R2_sorted_merged.bam R1R2_sorted_merged.sorted
samtools fixmate R1R2_sorted_merged.sorted.bam R1R2_sorted_merged.sorted.fixed.bam

# Convert bam to bedpe
# Command 1:
samtools view -@ 16 -bf 0x2 R1R2_sorted_merged.sorted.fixed.bam | bedtools bamtobed -i stdin -bedpe > R1R2_sorted_merged.sorted.fixed.bedpe
# Command 2:
bedtools bamtobed -i R1R2_sorted_merged.sorted.fixed.bam -bedpe > R1R2_sorted_merged.sorted.fixed.bedpe

Merging two bam file is inspired by James.

The Command 1 is from bedtools usage page which can "only convert properly-paired (FLAG == 0x2) reads".

But the output of bedpe file of Command is actually empty. Then I went double check the output of samtools view -@ 16 -bf 0x2 R1R2_sorted_merged.sorted.fixed.bam. It turned out that it is actually empty, which indicate that there are no "properly-pared" reads in the .sorted.fixed.bam file.

Then I tried Command 2, the ouput bedpe file is not empty but the number of lines(< 0.5million) is quite small compared to the raw reads. There is no replicate too, which is not possible.

Another downstream I am gonna do is to remove duplicate from bedpe file and convert the bedpe file into bed file. Is there any way to convert bedpe to bed?

Anyone got a hint on what might be the problem?

FYI: You might also refer to my previous post on this question

Thanks a lot!!!

EDIT:

I think the problem might lie in the bam file. R1.bam samtools flagstat output:

299344083 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

295354894 + 0 mapped (98.67%:-nan%)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (-nan%:-nan%)

0 + 0 with itself and mate mapped

0 + 0 singletons (-nan%:-nan%)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

R1R2_sorted_merged.bam samtools flagstat output:

603889264 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

590620124 + 0 mapped (97.80%:-nan%)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (-nan%:-nan%)

0 + 0 with itself and mate mapped

0 + 0 singletons (-nan%:-nan%)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

R1R2_sorted_merged.sorted.bam samtools flagstat output:

603889264 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

590620124 + 0 mapped (97.80%:-nan%)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (-nan%:-nan%)

0 + 0 with itself and mate mapped

0 + 0 singletons (-nan%:-nan%)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

R1R2_sorted_merged.sorted.fixed.bam samtools flagstat output:

603889264 + 0 in total (QC-passed reads + QC-failed reads)

0 + 0 duplicates

590620124 + 0 mapped (97.80%:-nan%)

0 + 0 paired in sequencing

0 + 0 read1

0 + 0 read2

0 + 0 properly paired (-nan%:-nan%)

0 + 0 with itself and mate mapped

0 + 0 singletons (-nan%:-nan%)

0 + 0 with mate mapped to a different chr

0 + 0 with mate mapped to a different chr (mapQ>=5)

Still don't know what is going on. :(

sequencing • 2.2k views
ADD COMMENT
0
Entering edit mode

Your mapping file is not recognizing the paired-end reads itself, can you check if they are the correct files and if the naming of reads is proper. Also mapping each-end of the read seperately and merging them into a single file doesn't make them paired together (since I see a R1-mapped file in your example). You need to map the paired-end data in the same run itself. Fixmate doesn't fix the flags, refer here

ADD REPLY
0
Entering edit mode

The problem is that these are ChIA-PET reads, which the two tags might come from different chromosomes. If I map as paired-end, this kind of interaction would be neglected. Also, for non-paired-end mapping, is the flagstat output looks like what I got for R1-mapped file above?

ADD REPLY

Login before adding your answer.

Traffic: 2468 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6