Question: Getting uniquely aligned properly paired reads
0
gravatar for nadiabeg.comsats
4 weeks ago by
nadiabeg.comsats0 wrote:

Hi all,

I have bam files obtained from mapping 10x genomics data. My goal is to get the VCF file. To do the variant calling I want to get rid of the unaligned, multi mapping and other cases. Goal is to get uniquely aligned properly paired reads.

My initial statistics of mapping are (without filtering)

788172985 + 0 in total (QC-passed reads + QC-failed reads)
42854167 + 0 secondary
0 + 0 supplementary
50143481 + 0 duplicates
741131280 + 0 mapped (94.03% : N/A)
745318818 + 0 paired in sequencing
372659409 + 0 read1
372659409 + 0 read2
508751213 + 0 properly paired (68.26% : N/A)
677983578 + 0 with itself and mate mapped
20293535 + 0 singletons (2.72% : N/A)
90544338 + 0 with mate mapped to a different chr
77234446 + 0 with mate mapped to a different chr (mapQ>=5)

To get uniquely aligned read, I am using these filters as suggested by many online platforms and Samtools flagstats.

samtools view -h -q 20 -F 3340 -f 3 -b $path > unique_realigned_Semlo_properlypaired.bam

Results:

420866696 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
420866696 + 0 mapped (100.00% : N/A)
420866696 + 0 paired in sequencing
210157890 + 0 read1
210708806 + 0 read2
420866696 + 0 properly paired (100.00% : N/A)
420866563 + 0 with itself and mate mapped
133 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

Problem: by applying the aforementioned filters, I am getting unique reads but the number of reads in read1 (r1) are not equal to read2(r2). So is there any other way to get uniquely aligned reads (aligned in proper pair)?

Thanks

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by nadiabeg.comsats0

If this is 10x genomics data is the BAM file generated by longranger?

ADD REPLYlink written 4 weeks ago by genomax75k

And if so, what is the problem with the VCF that it generates?

ADD REPLYlink written 4 weeks ago by igor8.9k

Hi Igor and genomax,

I hope you can still see this message. I am just going to start using supernova to assembly the genome of geckos. I am having a hard time using it. Can you please suggest articles, websites or anything that can help me. I have installed the software at my cluster and created a script. however, I am stuck here. I have never did this before and I honestly don't know the following steps. Can you please help me? What are the steps that O should follow? what I have to know before starting. Please any help would be greatly appreciated

Thank you so much in advance.

ADD REPLYlink written 4 weeks ago by samnioue0

Yes I am using longranger for alignment. Well I am only aligning the reads using -- longranger align parameter so I am not getting any vcf file out of it. But before getting VCF,I want to filter the bam file. Then pre-processing bam file --local realignment using GATK --- variant calling using freebayes.

I am not using --longranger wgs because this parameter is used when you want to use gatk as well along-with the longranger. (this is more suitable for the human genomes where ploidy is diploid).

ADD REPLYlink modified 4 weeks ago • written 4 weeks ago by nadiabeg.comsats0
1

Long Ranger can use GATK or Freebayes.

ADD REPLYlink written 4 weeks ago by igor8.9k

Yes it uses both but the parameters are tuned according to diploid genome. In my case I have tetraploid samples.

ADD REPLYlink written 29 days ago by nadiabeg.comsats0

You mean something is wrong with the 10x genomics sequenced file or with the alignment file?

Sure I will try that.

ADD REPLYlink written 4 weeks ago by nadiabeg.comsats0
0
gravatar for Carlo Yague
4 weeks ago by
Carlo Yague4.8k
Canada
Carlo Yague4.8k wrote:

Thats weird... looks like something is broken in your file because:

  • total reads (420866696) = properly paired (420866696) = Read 1 + Read 2; but Read 1 /= Read2 when there are no secondary/supplementary alignments.
  • There are also 133 singletons yet you used the samtools -f 3 filter so you should only have paired reads.

I would suggest to try to run samtools fixmate before your samtools view -h -q 20 -F 3340 -f 3 -b and see if that fixes it.

ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by Carlo Yague4.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 988 users visited in the last hour