Is there any set threshold for hisat2 alignment and featurecount assignment to select samples further?
0
0
Entering edit mode
4 weeks ago
sukeshinik5 ▴ 40

Is there any set threshold for hisat2 alignment and featurecount assignment to select samples further? Please find the script for Hisat2,

samples_dir="/xx321/samples/data"
aligned_dir="/xx321/samples/Aligned_trimmed_data"
hisat2_index="/xx321/samples/HISAT2/grch38/genome"
log_file="/xx321/samples/Trimm_alignment_log.log"
splice_sites_file= "/xx321/samples/Splice_Sites/gencode.v38.splicesites.txt"

# Loop through multiple samples
for R1_samples in ${samples_dir}/*_R1.fastq.gz; do
    sample_name=$(basename "$R1_samples" _R1.fastq.gz)
    R2_samples="${R1_samples/_R1/_R2}"

# HISAT2 alignment command with trimming options and capturing alignment statistics
    hisat2 -p 15 \
        -x "$hisat2_index" \
        --trim5 5 --trim3 5 \
        --ss "$splice_sites_file" \
        -1 "$R1_samples" \
        -2 "$R2_samples" \
        -S /dev/stdout 2> "${aligned_dir}/${sample_name}_hisat2.log" | \
    samtools view -bS - > "${aligned_dir}/${sample_name}_trimmed.aligned.bam"

Following is the output percent for fall in between 85% to 97%. However, when below featureCount command is used the assignment fall in between 65% to 89%. only one sample shows 55% assignment. how to improve the assignment.

annotation_file="/xx321/samples/Annotation_files/gencode.v38.annotation.gtf"
output_file="/xx321/samples/sample.trimmed.count.txt"
input_directory="/xx321/samples/Aligned_trimmed_data"
Splice_site_file="/xx321/samples/Splice_Sites/splicesites.bed"
log_file="/xx321/samples/featureCounts.log"

# Run featureCounts for all BAM files and save output to log file

featureCounts -p -T 14 -a "$annotation_file" -o "$output_file" --largestOverlap -t exon -g gene_id -M -O -J -s 0 "$input_directory"/*_trimmed.sorted.bam > "$log_file" 2>&1

Sorry for such naive questions I am new to the RNA sequencing analysis. it would be great if you could go through the script and let me know if anything needs improvement. Also, please find the script used to create a splice_site_file which was converted to .bed for including in featureCount as -J (hopefully I am doing right).

awk '$3 == "exon"' "${gtf_file}" | awk '{if ($7 == "+") print $1, $5, $4+1; else if ($7 == "-") print $1, $4, $5+1}' | sort -k1,1 -k2,2n | uniq > "${output_file}"

Is there a way to improve assignment in featurecount? also, on what basis we should have samples included in the analysis?

Thank you,

hisat2 alignment featurecount • 463 views
ADD COMMENT
1
Entering edit mode

I would not go by these lowlevel stats but see how data look in a PCA. If outliers match with samples that have low assignment percentages then maybe consider removing them. If they cluster reasonably, maybe keep. There is no "fixed" threshold.

ADD REPLY
0
Entering edit mode

Thank you for such prompt response. Is there a way to improve the assignment in featureCount?

ADD REPLY
0
Entering edit mode

I've never bothered with hisat2 but use STAR HISAT2 V.S. STAR

ADD REPLY
0
Entering edit mode

Sure, i will try using that Thankyou .

ADD REPLY

Login before adding your answer.

Traffic: 2655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6