Package picard.sam
Class DuplicationMetrics
- java.lang.Object
-
- htsjdk.samtools.metrics.MetricBase
-
- picard.analysis.MergeableMetricBase
-
- picard.sam.DuplicationMetrics
-
public class DuplicationMetrics extends MergeableMetricBase
Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class picard.analysis.MergeableMetricBase
MergeableMetricBase.MergeByAdding, MergeableMetricBase.MergeByAssertEquals, MergeableMetricBase.MergingIsManual, MergeableMetricBase.NoMergingIsDerived, MergeableMetricBase.NoMergingKeepsValue
-
-
Field Summary
Fields Modifier and Type Field Description LongESTIMATED_LIBRARY_SIZEThe estimated number of unique molecules in the library based on PE duplication.StringLIBRARYThe library on which the duplicate marking was performed.DoublePERCENT_DUPLICATIONThe fraction of mapped sequence that is marked as duplicate.longREAD_PAIR_DUPLICATESThe number of read pairs that were marked as duplicates.longREAD_PAIR_OPTICAL_DUPLICATESThe number of read pairs duplicates that were caused by optical duplication.longREAD_PAIRS_EXAMINEDThe number of mapped read pairs examined.longSECONDARY_OR_SUPPLEMENTARY_RDSThe number of reads that were either secondary or supplementarylongUNMAPPED_READSThe total number of unmapped reads examined.longUNPAIRED_READ_DUPLICATESThe number of fragments that were marked as duplicates.longUNPAIRED_READS_EXAMINEDThe number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
-
Constructor Summary
Constructors Constructor Description DuplicationMetrics()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description voidcalculateDerivedFields()Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.voidcalculateDerivedMetrics()Deprecated.htsjdk.samtools.util.Histogram<Double>calculateRoiHistogram()Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.static LongestimateLibrarySize(long readPairs, long uniqueReadPairs)Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs observed.static doubleestimateRoi(long estimatedLibrarySize, double x, long pairs, long uniquePairs)Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.static voidmain(String[] args)-
Methods inherited from class picard.analysis.MergeableMetricBase
canMerge, merge, merge, mergeIfCan
-
-
-
-
Field Detail
-
LIBRARY
public String LIBRARY
The library on which the duplicate marking was performed.
-
UNPAIRED_READS_EXAMINED
public long UNPAIRED_READS_EXAMINED
The number of mapped reads examined which did not have a mapped mate pair, either because the read is unpaired, or the read is paired to an unmapped mate.
-
READ_PAIRS_EXAMINED
public long READ_PAIRS_EXAMINED
The number of mapped read pairs examined. (Primary, non-supplemental)
-
SECONDARY_OR_SUPPLEMENTARY_RDS
public long SECONDARY_OR_SUPPLEMENTARY_RDS
The number of reads that were either secondary or supplementary
-
UNMAPPED_READS
public long UNMAPPED_READS
The total number of unmapped reads examined. (Primary, non-supplemental)
-
UNPAIRED_READ_DUPLICATES
public long UNPAIRED_READ_DUPLICATES
The number of fragments that were marked as duplicates.
-
READ_PAIR_DUPLICATES
public long READ_PAIR_DUPLICATES
The number of read pairs that were marked as duplicates.
-
READ_PAIR_OPTICAL_DUPLICATES
public long READ_PAIR_OPTICAL_DUPLICATES
The number of read pairs duplicates that were caused by optical duplication. Value is always < READ_PAIR_DUPLICATES, which counts all duplicates regardless of source.
-
PERCENT_DUPLICATION
public Double PERCENT_DUPLICATION
The fraction of mapped sequence that is marked as duplicate.
-
ESTIMATED_LIBRARY_SIZE
public Long ESTIMATED_LIBRARY_SIZE
The estimated number of unique molecules in the library based on PE duplication.
-
-
Method Detail
-
calculateDerivedFields
public void calculateDerivedFields()
Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.- Overrides:
calculateDerivedFieldsin classMergeableMetricBase
-
calculateDerivedMetrics
@Deprecated public void calculateDerivedMetrics()
Deprecated.Fills in the ESTIMATED_LIBRARY_SIZE based on the paired read data examined where possible and the PERCENT_DUPLICATION.Deprecated, use
calculateDerivedFields()instead.
-
estimateLibrarySize
public static Long estimateLibrarySize(long readPairs, long uniqueReadPairs)
Estimates the size of a library based on the number of paired end molecules observed and the number of unique pairs observed.Based on the Lander-Waterman equation that states: C/X = 1 - exp( -N/X ) where X = number of distinct molecules in library N = number of read pairs C = number of distinct fragments observed in read pairs
-
estimateRoi
public static double estimateRoi(long estimatedLibrarySize, double x, long pairs, long uniquePairs)Estimates the ROI (return on investment) that one would see if a library was sequenced to x higher coverage than the observed coverage.- Parameters:
estimatedLibrarySize- the estimated number of molecules in the libraryx- the multiple of sequencing to be simulated (i.e. how many X sequencing)pairs- the number of pairs observed in the actual sequencinguniquePairs- the number of unique pairs observed in the actual sequencing- Returns:
- a number z <= x that estimates if you had pairs*x as your sequencing then you would observe uniquePairs*z unique pairs.
-
calculateRoiHistogram
public htsjdk.samtools.util.Histogram<Double> calculateRoiHistogram()
Calculates a histogram using the estimateRoi method to estimate the effective yield doing x sequencing for x=1..10.
-
main
public static void main(String[] args)
-
-