Featurecount mapping is less
0
0
Entering edit mode
3 months ago

Hi,

I used STAR for alignment and featurecounts for quantification. The featurecount successfully assigned fragment seems to be less (around 25%). It will be really helpful if I find the reason why the matching percentage is low. It's a paired end sequencing - 83 x 83bp reads

STAR - generating indices

STAR --runThreadN 48 \
--genomeDir /scratch/vvenugo9/First_trial/indices \
--runMode genomeGenerate \
--genomeFastaFiles /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa \
--sjdbGTFfile /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf \
--sjdbOverhang 82

STAR alignment

STAR --readFilesCommand zcat \
--genomeDir /scratch/vvenugo9/First_trial/indices \
--readFilesIn /scratch/vvenugo9/First_trial/Choroid_C9_ALS1_TAGCTT_R1_001_out.fastq.gz /scratch/vvenugo9/First_trial/Choroid_C9_ALS1_TAGCTT_R2_001_out.fastq.gz \
--runThreadN 48 \
--sjdbGTFfile /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf \
--runMode alignReads \
--outSAMtype BAM Unsorted \
--outSAMmode Full \
--outSAMstrandField intronMotif \
--outFilterType BySJout \
--outSAMunmapped Within \
--outSAMmapqUnique 255 \
--outFilterMultimapNmax 20 \
--outFilterMismatchNmax 999 \
--outFilterMismatchNoverLmax 0.1 \
--alignMatesGapMax 1000000 \
--seedSearchStartLmax 5 \
--alignIntronMin 20 \
--alignIntronMax 1000000 \
--alignSJoverhangMin 18 \
--alignSJDBoverhangMin 18 \
--chimSegmentMin 18 \
--chimJunctionOverhangMin 18 \
--outSJfilterOverhangMin 18 18 18 18 \
--alignTranscriptsPerReadNmax 50000 \

Featurecount

featureCounts -p -T 48 -t exon -g gene_id -a /scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf -s 1 -o fc_CON1_C9_updated_index_and_star.txt /home/vvenugo9/ondemand/data/sys/myjobs/projects/default/11/Aligned.out.bam

Also the QC looks fine, but there's lot of warning in STAR while generating genome indices. Is that something I should worry about?

WARNING: while processing sjdbGTFfile=/scratch/vvenugo9/First_trial/Homo_sapiens.GRCh37.75.gtf: chromosome 'HG1287_PATCH' not found in Genome fasta files for line:
HG1287_PATCH    rRNA    exon    144126020   144126129   .   +   .   gene_id "ENSG00000252830"; transcript_id "ENST00000517021"; exon_number "1"; gene_name "RNA5SP59"; gene_source "ensembl"; gene_biotype "rRNA"; transcript_name "RNA5SP59-202"; transcript_source "ensembl"; exon_id "ENSE00002089298";`
Featurecount STAR RNAseq • 282 views
ADD COMMENT
1
Entering edit mode

As an aside, FeatureCounts is quite obsolete for RNA-seq, as it throws out multimappers. You'd be better off using a newer method that accounts for multimapping reads more intelligently, e.g. salmon, RSEM, etc.

ADD REPLY
0
Entering edit mode

Aren't those really old genome versions?

ADD REPLY
0
Entering edit mode

I'm trying to replicate an analysis for publishing purposes

ADD REPLY

Login before adding your answer.

Traffic: 1594 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6