QC Report


general
Report generated at2020-02-05 10:24:57
TitleEngrafment (old)
DescriptionATAC-seq Engrafment (old)
Pipeline versionv1.6.0.1
Pipeline typeatac
Genomemm10
Alignerbowtie2
Sequencing endedness{'rep1': {'paired_end': True}, 'rep2': {'paired_end': True}, 'rep3': {'paired_end': True}}
Peak callermacs2

Alignment quality metrics


SAMstat (raw unfiltered BAM)

rep1rep2rep3
Total Reads103191668117623518113431192
Total Reads (QC-failed)000
Duplicate Reads000
Duplicate Reads (QC-failed)000
Mapped Reads100751744114630557110603504
Mapped Reads (QC-failed)000
% Mapped Reads97.697.597.5
Paired Reads103191668117623518113431192
Paired Reads (QC-failed)000
Read1515958345881175956715596
Read1 (QC-failed)000
Read2515958345881175956715596
Read2 (QC-failed)000
Properly Paired Reads98450146112091028107884334
Properly Paired Reads (QC-failed)000
% Properly Paired Reads95.3999999999999995.395.1
With itself99273578113091628108733744
With itself (QC-failed)000
Singletons147816615389291869760
Singletons (QC-failed)000
% Singleton1.40000000000000011.31.6
Diff. Chroms712998540577901
Diff. Chroms (QC-failed)000

Marking duplicates (filtered BAM)

rep1rep2rep3
Unpaired Reads000
Paired Reads362818804278694939097409
Unmapped Reads000
Unpaired Duplicate Reads000
Paired Duplicate Reads144907651880383316568740
Paired Optical Duplicate Reads119582114908621408158
% Duplicate Reads39.93940000000000643.94759999999999442.3781

Filtered out (samtools view -F 1804):


Fraction of mitochondrial reads (unfiltered BAM)

rep1rep2rep3
Rn = Number of Non-mitochondrial Reads100405134114020894110167974
Rm = Number of Mitochondrial Reads92671816863811189806
Rm/(Rn+Rm) = Frac. of mitochondrial reads0.009145377112025940.014574545982523570.01068453412056167

rep1
rep1
rep2
rep2
rep3
rep3

Preseq performs a yield prediction by subsampling the reads, calculating the number of distinct reads, and then extrapolating out to see where the expected number of distinct reads no longer increases. The confidence interval gives a gauge as to the validity of the yield predictions.

SAMstat (filtered/deduped BAM)

rep1rep2rep3
Total Reads432330944747198444663354
Total Reads (QC-failed)000
Duplicate Reads000
Duplicate Reads (QC-failed)000
Mapped Reads432330944747198444663354
Mapped Reads (QC-failed)000
% Mapped Reads100.0100.0100.0
Paired Reads432330944747198444663354
Paired Reads (QC-failed)000
Read1216165472373599222331677
Read1 (QC-failed)000
Read2216165472373599222331677
Read2 (QC-failed)000
Properly Paired Reads432330944747198444663354
Properly Paired Reads (QC-failed)000
% Properly Paired Reads100.0100.0100.0
With itself432330944747198444663354
With itself (QC-failed)000
Singletons000
Singletons (QC-failed)000
% Singleton0.00.00.0
Diff. Chroms000
Diff. Chroms (QC-failed)000

Filtered and duplicates removed


Fragment length statistics (filtered/deduped BAM)

rep1rep2rep3
Fraction of reads in NFR0.42011241884453410.407035799977805040.33920748821447316
Fraction of reads in NFR (QC pass)TrueTrueFalse
Fraction of reads in NFR (QC reason)OKOKout of range [0.4, inf]
NFR / mono-nuc reads1.41337756039561161.2756011786906591.0741825974573367
NFR / mono-nuc reads (QC pass)FalseFalseFalse
NFR / mono-nuc reads (QC reason)out of range [2.5, inf]out of range [2.5, inf]out of range [2.5, inf]
Presence of NFR peakTrueTrueTrue
Presence of Mono-Nuc peakTrueTrueTrue
Presence of Di-Nuc peakFalseTrueFalse

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays show distinct fragment length enrichments, as the cut sites are only in open chromatin and not in nucleosomes. As such, peaks representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal) fragment lengths will arise. Good libraries will show these peaks in a fragment length distribution and will show specific peak ratios.



Sequence quality metrics (filtered/deduped BAM)

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays are known to have significant GC bias. Please take this into consideration as necessary.


Library complexity quality metrics


Library complexity (filtered non-mito BAM)

rep1rep2rep3
Total Fragments358961094208853638604077
Distinct Fragments216240962374907022339148
Positions with Two Read497417456157275223428
NRF = Distinct/Total0.6024080.5642650.578673
PBC1 = OneRead/Distinct0.6135970.5763710.591068
PBC2 = OneRead/TwoRead2.6674732.437492.527832

Mitochondrial reads are filtered out by default. The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads in a dataset; it is the ratio between the number of positions in the genome that uniquely mapped reads map to and the total number of uniquely mappable reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with AT LEAST one read pair. PBC1 is the primary measure, and the PBC1 should be close to 1. Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking, 0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with EXACTLY two read pairs. The PBC2 should be significantly greater than 1.


NRF (non redundant fraction)
PBC1 (PCR Bottleneck coefficient 1)
PBC2 (PCR Bottleneck coefficient 2)
PBC1 is the primary measure. Provisionally


rep1rep2rep3
Estimated library size by Picard tools51306387.052261294.052860515.0

Replication quality metrics


IDR (Irreproducible Discovery Rate) plots

rep1_vs_rep2
rep1_vs_rep2
rep1_vs_rep3
rep1_vs_rep3
rep2_vs_rep3
rep2_vs_rep3
rep1-pr1_vs_rep1-pr2
rep1-pr1_vs_rep1-pr2
rep2-pr1_vs_rep2-pr2
rep2-pr1_vs_rep2-pr2
rep3-pr1_vs_rep3-pr2
rep3-pr1_vs_rep3-pr2
pooled-pr1_vs_pooled-pr2
pooled-pr1_vs_pooled-pr2

Reproducibility QC and peak detection statistics

overlapidr
Nt423493734
N15856412
N25568933284
N310074612
Np7005829570
N optimal7005829570
N conservative423493734
Optimal Setpooled-pr1_vs_pooled-pr2pooled-pr1_vs_pooled-pr2
Conservative Setrep2_vs_rep3rep2_vs_rep3
Rescue Ratio1.65430116413610727.919121585431173
Self Consistency Ratio9.50973360655737780.7864077669903
Reproducibility Testborderlinefail

Reproducibility QC


Number of raw peaks

rep1rep2rep3
Number of peaks288126186525299145

Top 300000 raw peaks from macs2 with p-val threshold 0.01

Peak calling statistics


Peak region size

rep1rep2rep3idr_optoverlap_opt
Min size73.073.073.073.073.0
25 percentile73.098.073.0521.0227.0
50 percentile (median)107.0161.0118.0731.0413.0
75 percentile150.0336.0174.0952.0714.0
Max size1644.02921.03044.02768.02768.0
Mean123.97736754058988299.51371665996516142.03646726503868749.1535339871491501.4484855405521

rep1
rep1
rep2
rep2
rep3
rep3
idr_opt
idr_opt
overlap_opt
overlap_opt

Enrichment / Signal-to-noise ratio


Strand cross-correlation measures (filtered BAM)

rep1rep2rep3
Number of Subsampled Reads250000002500000025000000
Estimated Fragment Length000
Cross-correlation at Estimated Fragment Length0.1913164935798890.1961094228307720.191027316759835
Phantom Peak145145145
Cross-correlation at Phantom Peak0.17101090.17419070.1681984
Argmin of Cross-correlation150015001500
Minimum of Cross-correlation0.16878050.16848710.1659469
NSC (Normalized Strand Cross-correlation coeff.)1.1335221.1639431.151135
RSC (Relative Strand Cross-correlation coeff.)10.104194.84295811.13934


Performed on subsampled (25000000) reads. Such FASTQ trimming is for cross-corrleation analysis only.


rep1
rep1
rep2
rep2
rep3
rep3

TSS enrichment (filtered/deduped BAM)

rep1rep2rep3
TSS enrichment2.1517127696726427.6732445878981152.645575351127997

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays should show enrichment in open chromatin sites, such as TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is above 10. For other references please see https://www.encodeproject.org/atac-seq/


Jensen-Shannon distance (filtered/deduped BAM)

rep1rep2rep3
AUC0.29390090611046060.27860924033846470.29447058560694483
Synthetic AUC0.49543285333245490.49567669387893750.49562498909435554
X-intercept0.141840497176587280.145191800105578030.13796530321693087
Synthetic X-intercept0.00.00.0
Elbow Point0.51951794026842410.54514460992113640.5216874890022876
Synthetic Elbow Point0.49884433582157340.50249629769344530.5026287760685217
Synthetic JS Distance0.237563919703897420.266168644002274140.23925791633341062

Peak enrichment


Fraction of reads in peaks (FRiP)

FRiP for macs2 raw peaks

rep1rep2rep3rep1-pr1rep2-pr1rep3-pr1rep1-pr2rep2-pr2rep3-pr2pooledpooled-pr1pooled-pr2
Fraction of Reads in Peaks0.085585454513156050.088679841145885120.09450703142446490.056637905367684050.09545124551777740.052538774739632190.053290613588313320.097785337979554420.062692607576789130.076903298990712990.087490099981653040.08114197795072275

FRiP for overlap peaks

rep1_vs_rep2rep1_vs_rep3rep2_vs_rep3rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.012203960521608170.0120221308318027940.0167185285857488560.00355970359188264450.0443005499833333260.0073629042727064340.025427996388404647

FRiP for IDR peaks

rep1_vs_rep2rep1_vs_rep3rep2_vs_rep3rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.00048985571466174620.00087114106485328870.00266596129295491870.00061661096936527380.030494048868907610.00200846985204022060.013349988422707001

For macs2 raw peaks:


For overlap/IDR peaks:

Annotated genomic region enrichment

rep1rep2rep3
Fraction of Reads in universal DHS regions0.385750554887420270.426705422718376360.3827142045803367
Fraction of Reads in blacklist regions0.00263589739841427980.0035360434904089960.0026451215464024487
Fraction of Reads in promoter regions0.02169208615973680.051281888703029560.023350037706527816
Fraction of Reads in enhancer regions0.36060738562916640.3710915684501410.3559101942948575

Signal to noise can be assessed by considering whether reads are falling into known open regions (such as DHS regions) or not. A high fraction of reads should fall into the universal (across cell type) DHS set. A small fraction should fall into the blacklist regions. A high set (though not all) should fall into the promoter regions. A high set (though not all) should fall into the enhancer regions. The promoter regions should not take up all reads, as it is known that there is a bias for promoters in open chromatin assays.


Other quality metrics


Comparison to Roadmap DNase

rep1
rep1
rep2
rep2
rep3
rep3

This bar chart shows the correlation between the Roadmap DNase samples to your sample, when the signal in the universal DNase peak region sets are compared. The closer the sample is in signal distribution in the regions to your sample, the higher the correlation.