QC Report


general
Report generated at2020-01-31 04:37:39
TitleEngrafment (yng)
DescriptionATAC-seq Engrafment (yng)
Pipeline versionv1.6.0.1
Pipeline typeatac
Genomemm10
Alignerbowtie2
Sequencing endedness{'rep1': {'paired_end': True}, 'rep2': {'paired_end': True}, 'rep3': {'paired_end': True}}
Peak callermacs2

Alignment quality metrics


SAMstat (raw unfiltered BAM)

rep1rep2rep3
Total Reads7538455010091381699443840
Total Reads (QC-failed)000
Duplicate Reads000
Duplicate Reads (QC-failed)000
Mapped Reads737622909735509397005112
Mapped Reads (QC-failed)000
% Mapped Reads97.896.597.5
Paired Reads7538455010091381699443840
Paired Reads (QC-failed)000
Read1376922755045690849721920
Read1 (QC-failed)000
Read2376922755045690849721920
Read2 (QC-failed)000
Properly Paired Reads721931629452649094607560
Properly Paired Reads (QC-failed)000
% Properly Paired Reads95.893.795.1
With itself728241469526390495440786
With itself (QC-failed)000
Singletons93814420911891564326
Singletons (QC-failed)000
% Singleton1.22.11.6
Diff. Chroms538317012670521
Diff. Chroms (QC-failed)000

Marking duplicates (filtered BAM)

rep1rep2rep3
Unpaired Reads000
Paired Reads263642283567412036543113
Unmapped Reads000
Unpaired Duplicate Reads000
Paired Duplicate Reads90636661657806013481582
Paired Optical Duplicate Reads76041114857791249474
% Duplicate Reads34.378746.47080000000000436.8923

Filtered out (samtools view -F 1804):


Fraction of mitochondrial reads (unfiltered BAM)

rep1rep2rep3
Rn = Number of Non-mitochondrial Reads735535429659541696587216
Rm = Number of Mitochondrial Reads57885120144391157083
Rm/(Rn+Rm) = Frac. of mitochondrial reads0.0078083409502240140.0204283740200206160.011837856650851832

rep1
rep1
rep2
rep2
rep3
rep3

Preseq performs a yield prediction by subsampling the reads, calculating the number of distinct reads, and then extrapolating out to see where the expected number of distinct reads no longer increases. The confidence interval gives a gauge as to the validity of the yield predictions.

SAMstat (filtered/deduped BAM)

rep1rep2rep3
Total Reads343645123762299645714078
Total Reads (QC-failed)000
Duplicate Reads000
Duplicate Reads (QC-failed)000
Mapped Reads343645123762299645714078
Mapped Reads (QC-failed)000
% Mapped Reads100.0100.0100.0
Paired Reads343645123762299645714078
Paired Reads (QC-failed)000
Read1171822561881149822857039
Read1 (QC-failed)000
Read2171822561881149822857039
Read2 (QC-failed)000
Properly Paired Reads343645123762299645714078
Properly Paired Reads (QC-failed)000
% Properly Paired Reads100.0100.0100.0
With itself343645123762299645714078
With itself (QC-failed)000
Singletons000
Singletons (QC-failed)000
% Singleton0.00.00.0
Diff. Chroms000
Diff. Chroms (QC-failed)000

Filtered and duplicates removed


Fragment length statistics (filtered/deduped BAM)

rep1rep2rep3
Fraction of reads in NFR0.472975949873072570.31858318654098130.35030294130509704
Fraction of reads in NFR (QC pass)TrueFalseFalse
Fraction of reads in NFR (QC reason)OKout of range [0.4, inf]out of range [0.4, inf]
NFR / mono-nuc reads1.6907756738375631.06635666776353061.0921757100254188
NFR / mono-nuc reads (QC pass)FalseFalseFalse
NFR / mono-nuc reads (QC reason)out of range [2.5, inf]out of range [2.5, inf]out of range [2.5, inf]
Presence of NFR peakTrueTrueTrue
Presence of Mono-Nuc peakTrueTrueTrue
Presence of Di-Nuc peakFalseTrueTrue

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays show distinct fragment length enrichments, as the cut sites are only in open chromatin and not in nucleosomes. As such, peaks representing different n-nucleosomal (ex mono-nucleosomal, di-nucleosomal) fragment lengths will arise. Good libraries will show these peaks in a fragment length distribution and will show specific peak ratios.



Sequence quality metrics (filtered/deduped BAM)

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays are known to have significant GC bias. Please take this into consideration as necessary.


Library complexity quality metrics


Library complexity (filtered non-mito BAM)

rep1rep2rep3
Total Fragments261263473483934336065741
Distinct Fragments171888901882143522865316
Positions with Two Read378349345173755151083
NRF = Distinct/Total0.6579140.5402350.63399
PBC1 = OneRead/Distinct0.663430.5509310.641452
PBC2 = OneRead/TwoRead3.0140492.2954272.847363

Mitochondrial reads are filtered out by default. The non-redundant fraction (NRF) is the fraction of non-redundant mapped reads in a dataset; it is the ratio between the number of positions in the genome that uniquely mapped reads map to and the total number of uniquely mappable reads. The NRF should be > 0.8. The PBC1 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with AT LEAST one read pair. PBC1 is the primary measure, and the PBC1 should be close to 1. Provisionally 0-0.5 is severe bottlenecking, 0.5-0.8 is moderate bottlenecking, 0.8-0.9 is mild bottlenecking, and 0.9-1.0 is no bottlenecking. The PBC2 is the ratio of genomic locations with EXACTLY one read pair over the genomic locations with EXACTLY two read pairs. The PBC2 should be significantly greater than 1.


NRF (non redundant fraction)
PBC1 (PCR Bottleneck coefficient 1)
PBC2 (PCR Bottleneck coefficient 2)
PBC1 is the primary measure. Provisionally


rep1rep2rep3
Estimated library size by Picard tools43541505.044741018.057671123.0

Replication quality metrics


IDR (Irreproducible Discovery Rate) plots

rep1_vs_rep2
rep1_vs_rep2
rep1_vs_rep3
rep1_vs_rep3
rep2_vs_rep3
rep2_vs_rep3
rep1-pr1_vs_rep1-pr2
rep1-pr1_vs_rep1-pr2
rep2-pr1_vs_rep2-pr2
rep2-pr1_vs_rep2-pr2
rep3-pr1_vs_rep3-pr2
rep3-pr1_vs_rep3-pr2
pooled-pr1_vs_pooled-pr2
pooled-pr1_vs_pooled-pr2

Reproducibility QC and peak detection statistics

overlapidr
Nt8033330400
N1244421051
N26159236838
N33309011670
Np9791940241
N optimal9791940241
N conservative8033330400
Optimal Setpooled-pr1_vs_pooled-pr2pooled-pr1_vs_pooled-pr2
Conservative Setrep2_vs_rep3rep2_vs_rep3
Rescue Ratio1.21891377142643751.323717105263158
Self Consistency Ratio2.519924719744701735.05042816365366
Reproducibility Testborderlineborderline

Reproducibility QC


Number of raw peaks

rep1rep2rep3
Number of peaks181027208643299119

Top 300000 raw peaks from macs2 with p-val threshold 0.01

Peak calling statistics


Peak region size

rep1rep2rep3idr_optoverlap_opt
Min size73.073.073.073.073.0
25 percentile73.082.082.0658.0276.0
50 percentile (median)119.0145.0146.0886.0522.0
75 percentile171.0257.0228.01121.0863.0
Max size3304.02448.03303.02697.02697.0
Mean143.9944317698465253.49088634653452196.0902249606344902.7222484530703601.6399881534738

rep1
rep1
rep2
rep2
rep3
rep3
idr_opt
idr_opt
overlap_opt
overlap_opt

Enrichment / Signal-to-noise ratio


Strand cross-correlation measures (filtered BAM)

rep1rep2rep3
Number of Subsampled Reads250000002500000025000000
Estimated Fragment Length000
Cross-correlation at Estimated Fragment Length0.188898313240490.1788137936764780.204186010224665
Phantom Peak145145150
Cross-correlation at Phantom Peak0.16706180.15387050.1816818
Argmin of Cross-correlation150015001500
Minimum of Cross-correlation0.1643040.14960190.1784186
NSC (Normalized Strand Cross-correlation coeff.)1.1496881.1952641.144421
RSC (Relative Strand Cross-correlation coeff.)8.9181886.8435447.8964


Performed on subsampled (25000000) reads. Such FASTQ trimming is for cross-corrleation analysis only.


rep1
rep1
rep2
rep2
rep3
rep3

TSS enrichment (filtered/deduped BAM)

rep1rep2rep3
TSS enrichment3.20736221729491976.7880529306031954.478746477230261

rep1
rep1
rep2
rep2
rep3
rep3

Open chromatin assays should show enrichment in open chromatin sites, such as TSS's. An average TSS enrichment in human (hg19) is above 6. A strong TSS enrichment is above 10. For other references please see https://www.encodeproject.org/atac-seq/


Jensen-Shannon distance (filtered/deduped BAM)

rep1rep2rep3
AUC0.27598401677044770.27571352839960760.2828492621075567
Synthetic AUC0.49483192375079450.49525810800020510.49567017925184886
X-intercept0.16149111791852960.144654754573012220.14371895670535165
Synthetic X-intercept0.00.00.0
Elbow Point0.54113111567901330.55275660457341210.5374499108192631
Synthetic Elbow Point0.50159838533363830.50034540589761420.503880209595529
Synthetic JS Distance0.25449937772341410.26886205932135360.25695992049397

Peak enrichment


Fraction of reads in peaks (FRiP)

FRiP for macs2 raw peaks

rep1rep2rep3rep1-pr1rep2-pr1rep3-pr1rep1-pr2rep2-pr2rep3-pr2pooledpooled-pr1pooled-pr2
Fraction of Reads in Peaks0.070663916309942070.09893829295253360.110804006590704940.130843411947767520.148589761432077330.081252690637107870.138105147542907060.143478738375859280.080758714230601530.092262783952630850.10076915869648250.10820800848355618

FRiP for overlap peaks

rep1_vs_rep2rep1_vs_rep3rep2_vs_rep3rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.0251007662717476040.026166699232073220.0342937944778416160.014945971006368430.048148185753202640.0234703191432626070.04012158340840029

FRiP for IDR peaks

rep1_vs_rep2rep1_vs_rep3rep2_vs_rep3rep1-pr1_vs_rep1-pr2rep2-pr1_vs_rep2-pr2rep3-pr1_vs_rep3-pr2pooled-pr1_vs_pooled-pr2
Fraction of Reads in Peaks0.004770572930087790.0054307849343678340.0163990568487326940.0022052983030866260.03410339782615930.0103134312366531810.020290729132570906

For macs2 raw peaks:


For overlap/IDR peaks:

Annotated genomic region enrichment

rep1rep2rep3
Fraction of Reads in universal DHS regions0.3937908095421230.42418386882320590.41445188941577255
Fraction of Reads in blacklist regions0.00268602679415322430.0040120940926660920.002781200137078123
Fraction of Reads in promoter regions0.02744930584202680.0520111423343319060.03473118280981189
Fraction of Reads in enhancer regions0.362837190878776360.367343259957287840.3760714806497902

Signal to noise can be assessed by considering whether reads are falling into known open regions (such as DHS regions) or not. A high fraction of reads should fall into the universal (across cell type) DHS set. A small fraction should fall into the blacklist regions. A high set (though not all) should fall into the promoter regions. A high set (though not all) should fall into the enhancer regions. The promoter regions should not take up all reads, as it is known that there is a bias for promoters in open chromatin assays.


Other quality metrics


Comparison to Roadmap DNase

rep1
rep1
rep2
rep2
rep3
rep3

This bar chart shows the correlation between the Roadmap DNase samples to your sample, when the signal in the universal DNase peak region sets are compared. The closer the sample is in signal distribution in the regions to your sample, the higher the correlation.