Question

Confused with output of STAR aligner for paired end reads and need advice on closely related SNPs analyses

0

Entering edit mode

18 months ago

mohsamir2016 ▴ 30

Dear all,

I am aligning paired end reads from two closely related chicken breeds against chicken reference genome. In each bread, I have 5 individuals (samples), each individual has two files (R1 and R2).

When I run alignment by STAR using this code: I am running them in the directory containing all fastq files of the 5 samples using a for in loop:

STAR --runMode alignReads --genomeDir IndexRef/ --outSAMtype BAM SortedByCoordinate --readFilesIn ${file} --outFileNamePrefix mapped/L10/${file} --runThreadN 12

it produced for each sample, two file that ends with (R1.Aligned.sortedByCoord.out.bam and R2 Aligned.sortedByCoord.out.bam). Now I know that these two files is unsorted BAM, each have statistics on % mapping,etc. I am confused which one of these 2 is considered a final alignment file for this sample? Do these two files combine after that when running samtools on them? I assume that there should be single BAM file to be considered as an aligment to be analyzed and visualized using genome browser ?

Another question: My two breeds are two closely related breeds, so senstivity is important to pick up SNPs differ between both, so you think the above code is doing highly sensitive alignment? Or do I need to add more options?

Thanks

RNA-seq • 2.3k views

ADD COMMENT • link updated 15 months ago by Ram 43k • written 18 months ago by mohsamir2016 ▴ 30

0

Entering edit mode

I actually tried to run the 5 samples (each have paired end) simultaneously sing bash script: in the script my code was:

 #!/bin/bash
STAR --runMode alignReads --genomeDir IndexRef/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0001_L10AU1_A56592_1_HGFCJDSX2_TTACCGAC-CGTATTCG_L003_R1_trimmed.fastq R0629-S0001_L10AU1_A56592_1_HGFCJDSX2_TTACCGAC-CGTATTCG_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10 --runThreadN 12
STAR --runMode alignReads --genomeDir IndexRef/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0005_L10BU1_A56596_1_HGFCJDSX2_AAGACCGT-CAATCGAC_L003_R1_trimmed.fastq R0629-S0005_L10BU1_A56596_1_HGFCJDSX2_AAGACCGT-CAATCGAC_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10 --runThreadN 12
STAR --runMode alignReads --genomeDir IndexRef/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0009_L10CU1_A56600_1_HGFCJDSX2_CAGGTTCA-GGCGTTAT_L003_R1_trimmed.fastq R0629-S0009_L10CU1_A56600_1_HGFCJDSX2_CAGGTTCA-GGCGTTAT_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10 --runThreadN 12
STAR --runMode alignReads --genomeDir IndexRef/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0014_L10DU1_A56605_1_HGFCJDSX2_AGCCTATC-GTTACGCA_L003_R1_trimmed.fastq R0629-S0014_L10DU1_A56605_1_HGFCJDSX2_AGCCTATC-GTTACGCA_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10 --runThreadN 12
STAR --runMode alignReads --genomeDir IndexRef/ --outSAMtype BAM SortedByCoordinate --readFilesIn R0629-S0017_L10EU1_A56608_1_HGFCJDSX2_TTGCGAGA-GTGCCATA_L003_R1_trimmed.fastq R0629-S0017_L10EU1_A56608_1_HGFCJDSX2_TTGCGAGA-GTGCCATA_L003_R2_trimmed.fastq --outFileNamePrefix mapped/L10/BAM_L10 --runThreadN 12

This actually produced just one BAM file! I was expecting 5 BAM files for the 5 samples. Any comment on the code? Shall I leave space after the command?

Thanks

ADD REPLY • link updated 15 months ago by Ram 43k • written 18 months ago by mohsamir2016 ▴ 30

1

Entering edit mode

As noted by @swbarnes2 You need to use a unique name in --outFileNamePrefix mapped/L10/**UNIQUE_NAME_HERE** in each command to ensure that five sets of result files will end up with unique names.

ADD REPLY • link 18 months ago by GenoMax 142k

0

Entering edit mode

Can you explain exactly what part of the code you think tells the software to make 5 different bams, instead of rewriting over the same one over and over again?

ADD REPLY • link 18 months ago by swbarnes2 14k

Ram · Answer 1 · 2022-10-23

1

Entering edit mode

18 months ago

swbarnes2 14k

You need to redo the alignments. Star needs R1 and matching R2 together as input.

ADD COMMENT • link 18 months ago by swbarnes2 14k

0

Entering edit mode

Dear @swbarnes2,

Thanks for the answer: So, you are telling me that in each alignment job, I need to only supply two fastq files, R1 and R2 of the same sample? SO, what is the case if I have 5 samples and I need to do alignment for all of them at once? That is why I made the loop?

Thanks

ADD REPLY • link updated 15 months ago by Ram 43k • written 18 months ago by mohsamir2016 ▴ 30

1

Entering edit mode

STAR manual has this relevant section:

Multiple samples can be mapped in one job. For single-end reads use a comma separated list (no spaces around commas), e.g. --readFilesIn sample1.fq,sample2.fq,sample3.fq. For paired-end reads, use comma separated list for read1 /space/ comma separated list for read2, e.g.: --readFilesIn sample1read1.fq,sample2read1.fq,sample3read1.fq sample1read2.fq,sample2read2.fq,sample3read2.fq