Getting uniquely aligned properly paired reads
1
0
Entering edit mode
2.6 years ago

Hi all,

I have bam files obtained from mapping 10x genomics data. My goal is to get the VCF file. To do the variant calling I want to get rid of the unaligned, multi mapping and other cases. Goal is to get uniquely aligned properly paired reads.

My initial statistics of mapping are (without filtering)

788172985 + 0 in total (QC-passed reads + QC-failed reads)
42854167 + 0 secondary
0 + 0 supplementary
50143481 + 0 duplicates
741131280 + 0 mapped (94.03% : N/A)
745318818 + 0 paired in sequencing
508751213 + 0 properly paired (68.26% : N/A)
677983578 + 0 with itself and mate mapped
20293535 + 0 singletons (2.72% : N/A)
90544338 + 0 with mate mapped to a different chr
77234446 + 0 with mate mapped to a different chr (mapQ>=5)


To get uniquely aligned read, I am using these filters as suggested by many online platforms and Samtools flagstats.

samtools view -h -q 20 -F 3340 -f 3 -b \$path > unique_realigned_Semlo_properlypaired.bam


Results:

420866696 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
420866696 + 0 mapped (100.00% : N/A)
420866696 + 0 paired in sequencing
420866696 + 0 properly paired (100.00% : N/A)
420866563 + 0 with itself and mate mapped
133 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


Problem: by applying the aforementioned filters, I am getting unique reads but the number of reads in read1 (r1) are not equal to read2(r2). So is there any other way to get uniquely aligned reads (aligned in proper pair)?

Thanks

10XGenomics Properly paired unique reads • 1.5k views
0
Entering edit mode

If this is 10x genomics data is the BAM file generated by longranger?

0
Entering edit mode

And if so, what is the problem with the VCF that it generates?

0
Entering edit mode

Hi Igor and genomax,

I hope you can still see this message. I am just going to start using supernova to assembly the genome of geckos. I am having a hard time using it. Can you please suggest articles, websites or anything that can help me. I have installed the software at my cluster and created a script. however, I am stuck here. I have never did this before and I honestly don't know the following steps. Can you please help me? What are the steps that O should follow? what I have to know before starting. Please any help would be greatly appreciated

Thank you so much in advance.

0
Entering edit mode

Yes I am using longranger for alignment. Well I am only aligning the reads using -- longranger align parameter so I am not getting any vcf file out of it. But before getting VCF,I want to filter the bam file. Then pre-processing bam file --local realignment using GATK --- variant calling using freebayes.

I am not using --longranger wgs because this parameter is used when you want to use gatk as well along-with the longranger. (this is more suitable for the human genomes where ploidy is diploid).

1
Entering edit mode

Long Ranger can use GATK or Freebayes.

0
Entering edit mode

Yes it uses both but the parameters are tuned according to diploid genome. In my case I have tetraploid samples.

0
Entering edit mode

You mean something is wrong with the 10x genomics sequenced file or with the alignment file?

Sure I will try that.

0
Entering edit mode
2.6 years ago

Thats weird... looks like something is broken in your file because:

• There are also 133 singletons yet you used the samtools -f 3 filter so you should only have paired reads.

I would suggest to try to run samtools fixmate before your samtools view -h -q 20 -F 3340 -f 3 -b and see if that fixes it.