HTseq low counts
0
0
Entering edit mode
2.8 years ago
aka ▴ 10

Hello community,

I'm coming to you because I'm a bit lost. I made a snakemake pipeline to process RNAseq data which are Truseq Stranded Illumina.

I have done the classic processing FASTQC, TRIMMOMATIC, FASTQC2, BWAmem2, and it is for the count part that I have done with HTseq and with salmon where I have a problem.

I have very few counts, barely 20-30%.

For Htseq I tried sorting with samtools sort by name, position, without sorting and I always get around 26% except once I made a mistake and I sorted with samtools by position and counted with HTseq count by name and then I got 50% mapping. I guess it must be wrong...

Here is my command:

rule count_matrix:
    input:
        gff="../resources/reference/gff_gtf/25K_for_greg.gff",
        alg_file= "../results/sam_file/{sample}_sort_mapping.sam"
    output:
        inter= "../results/stat/Intersec/{sample}_count_matrix_position.csv",
    message:
        """
        --- Count genes per reads on {wildcards.sample} in process ---
        """
    shell:
        "htseq-count -m intersection-nonempty -f sam -t CDS -r pos -s reverse -i gene_id {input.alg_file} {input.gff} > {output.inter} "

For salmon I have this command:

rule count_matrix:
    input:
        alg_file="../results/mapped_reads/bwa_mem/{sample}_sort_mapping.bam",
        index = "../resources/reference/Qrob_H2.3_Genes_v2.2_20161004.CDS_nuc.fsa",
    output:
       temp("../results/stat/salmon/{sample}_mock.txt")
    params:
        out2= directory("../results/stat/salmon/{sample}_A"),
    threads: 10
    resources:
        mem_mb=25000
    message:
        """
        --- Count genes per reads {wildcards.sample} with salmon in process ---
        """
    shell:
        "salmon quant -t {input.index} -l A -a {input.alg_file} -o {params.out2} -p {threads} && touch {output} {log}"

I get with the auto library the ISR library 19% of mapping....

To check, I have generated my flagstat with this command:

rule align:
    priority: 0
    input:
        reference=config["genome_ref"],
        reads=["../results/trimmed/{sample}_R1_trimmed.fastq.gz","../results/trimmed/{sample}_R2_trimmed.fastq.gz"],
        faidx=config["genome_ref_faidx"]
    output:
        final_bam="../results/mapped_reads/bwa_mem/{sample}_sort_mapping.bam",
        flag= "../results/mapped_reads/bwa_mem/flagstat/{sample}_sort_mapping.bam.flagstat"
    message:
        """
        Mapping with BWA mem2 : {wildcards.sample} on ref
        """
    threads: 10
    resources:
        mem_mb=25000
    shell:
        "bwa-mem2 mem -M -t {threads} -v 2 {input.reference} {input.reads} | samtools view -q 20 -f 2 -F 3840 --threads {threads} -Sb -> {output.final_bam} && samtools flagstat {output.final_bam} > {output.flag} "

The flagstat gives me a good alignment 50%: The flagstat is only the proper pairs. On R1+R2 in total after trimming I had 171095182 reads and on the flagstat I have 91366309 .

91366309 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
91366309 + 0 mapped (100.00% : N/A)
91366309 + 0 paired in sequencing
45695163 + 0 read1
45671146 + 0 read2
91366309 + 0 properly paired (100.00% : N/A)
91366309 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)

My BAM file is like this

@HD VN:1.6 SO:coordinate
@SQ SN:Qrob_P0000010.2 LN:693
@SQ SN:Qrob_P0000020.2 LN:1188
@SQ SN:Qrob_P0000030.2 LN:660
@SQ SN:Qrob_P0000290.2 LN:204
@SQ SN:Qrob_P0000440.2 LN:561
@SQ SN:Qrob_P0000460.2 LN:696

I really paid attention to the option for the sorting of the BAM, the direction: reverse in my case ect...

I confess I don't understand, if someone could enlighten me

Thanks in advance and have a nice day,

Aka

snakemake RNASeq htseq • 554 views
ADD COMMENT

Login before adding your answer.

Traffic: 1909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6