Question: remove duplicates with markdup
0
gravatar for dimitrischat
8 weeks ago by
dimitrischat60
dimitrischat60 wrote:

Hello all. I am trying to do a paired-end analysis. Ran the 2 fastq files with hisat2:

hisat2 -p 10 -x '..index_hg19/indexed' -1 R1_001.fastq.gz -2 R2_001.fastq.gz -S hisat2output.sam

Then :

samtools view -@ 10 -bS hisat2output.sam > hisat2.bam
samtools sort -@ 12 -n hisat2.bam > testSort.bam
samtools fixmate -m -@ 12 testSort.bam testSort_fixmate.bam

then i try to remove duplicates :

samtools markdup -r -@ 12 testSort_fixmate.bam > testSort_fixmate-markdup.bam

but returns problem :

ERROR: queryname sorted, must be sorted by coordinate.

WHEN i don't use -n option in samtools sort , samtools markdup returns problem:

ERROR: Coordinate sorted, require grouped/sorted by queryname.

I cant find where is the problem. Any help ?

rna-seq samtools • 201 views
ADD COMMENTlink modified 8 weeks ago by h.mon24k • written 8 weeks ago by dimitrischat60
1

I formatted your post (again) with code / quote markdown, to improve readability. Please use the formatting buttons in the future.

ADD REPLYlink written 8 weeks ago by h.mon24k

What does samtools view -H testSort.bam | head -5 show?

ADD REPLYlink modified 8 weeks ago • written 8 weeks ago by genomax65k
@HD VN:1.0  SO:queryname
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chr10    LN:135534747
@SQ SN:chr11    LN:135006516
@SQ SN:chr12    LN:133851895
@SQ SN:chr13    LN:115169878
@SQ SN:chr14    LN:107349540
@SQ SN:chr15    LN:102531392
@SQ SN:chr16    LN:90354753
@SQ SN:chr17    LN:81195210
@SQ SN:chr18    LN:78077248
@SQ SN:chr19    LN:59128983
@SQ SN:chr20    LN:63025520
@SQ SN:chr21    LN:48129895
@SQ SN:chr22    LN:51304566
@SQ SN:chrM LN:16571
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@PG ID:hisat2   PN:hisat2   VN:2.1.0    CL:"/home/app/Desktop/apps/hisat2-2.1.0/hisat2-align-s --wrapper basic-0 -p 10 -x /home/app/Desktop/jim/indexes/hisat2/index_hg19/indexed -S hisat2output.sam -1 /tmp/32541.inpipe1 -2 /tmp/32541.inpipe2"
ADD REPLYlink written 8 weeks ago by dimitrischat60

Likely not the solution to your problem, but you can make these commands a lot shorter and avoid intermediate files:

hisat2 -p 10 -x '..index_hg19/indexed' -1 R1_001.fastq.gz -2 R2_001.fastq.gz -S hisat2output.sam

Then :

samtools view -@ 10 -bS hisat2output.sam > hisat2.bam

samtools sort -@ 12 -n hisat2.bam > testSort.bam

By using pipes and directly going to samtools sort:

hisat2 -p 10 -x '..index_hg19/indexed' -1 R1_001.fastq.gz -2 R2_001.fastq.gz | samtools sort -n -o testSort.bam

This requires a reasonably recent version of samtools.

ADD REPLYlink written 8 weeks ago by WouterDeCoster38k

yes i do use them like you suggested. but now i am trying to figure out where the problem was. thats why i was running the commands 1 by 1

ADD REPLYlink written 8 weeks ago by dimitrischat60
1
gravatar for h.mon
8 weeks ago by
h.mon24k
Brazil
h.mon24k wrote:

edited answer:

You have to name-sort for samtools fixmate, and coordinate-sort for samtools markdup. Here is the correct order of operations, from samtools man file:

# The first sort can be omitted if the file is already name ordered
samtools sort -n -o namesort.bam example.bam

# Add ms and MC tags for markdup to use later
samtools fixmate -m namesort.bam fixmate.bam

# Markdup needs position order
samtools sort -o positionsort.bam fixmate.bam

# Finally mark duplicates
samtools markdup positionsort.bam markdup.bam

original answer

The problem is you are sorting by name (samtools sort -n), and you should sort by coordinate:

samtools sort -@ 12 hisat2.bam > testSort.bam
ADD COMMENTlink modified 8 weeks ago • written 8 weeks ago by h.mon24k

when i sort by coordinate it returns this problem :

[bam_mating_core] ERROR: Coordinate sorted, require grouped/sorted by queryname.

ADD REPLYlink written 8 weeks ago by dimitrischat60
1

I edited my answer in view of this comment. Probably one error stems from samtools fixmate, and the other, from samtools markdup.

ADD REPLYlink written 8 weeks ago by h.mon24k

that worked. But could you tell me why my other pipeline was wrong? Do i need to do the one you suggested when its paired-end ?

ADD REPLYlink written 8 weeks ago by dimitrischat60
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1897 users visited in the last hour