Copy-fm: A tool for determination of the fraction of mosaicism in copy number variations

2022-10
Acun, Melisa
Çetinkaya, Arda
Copy number variations (CNVs) are >50bp structural chromosomal variants that represent a regional change in the normal diploid (2 copies) copy number (CN) of genomic regions. All CNs in a single cell are bound to be integers, however a population of cells with distinct subpopulations having different CNs may acquire non-integer copy number values for the total cell population. This is called genetic mosaicism and CNVs that result in such mosaicism are referred to as mosaic CNVs (mCNVs). mCNVs are encountered when a CNV is not germline but acquired later in life. Such acquired changes are frequently found in human cancer tissues and rarely in congenital genetic disorders. Determining the fraction of mosaicism (fm) is crucial in establishing disease severity, evaluating disease progression and response to treatment in individuals with cancer. Although some specific clinical genetic tests are available for determining fm of common cancer-associated structural variants, versatile methods for detecting fm for rare or novel mCNVs are yet to developed. Microarray has emerged as a commonly employed and reliable genome-wide method for detecting human CNVs which are usually undetectable by classical cytogenetic approaches. Here, we present a computational tool developed in R which we name as Copy-fm (Copy number variation – fraction of mosaicism), to address the need for detecting fm for large CNVs, using data obtained from SNP microarrays. The approach utilized by Copy-fm makes use of B Allele Frequency (BAF) data (Figure 1), one of the two fundamental values obtained from SNP microarrays for each oligonucleotide probe, the other being LRR (log2 R Ratio). Copy-fm algorithm relies on fitting cumulative distribution function (CDF) of heterozygous BAF values of given genomic regions in a sample suspected to harbor mCNVs to mCNV CDF models calculated from a set of control microarray data. The algorithm, then evaluates the goodness-of-fit by Kolmogorov-Smirnov (KS) Test to find the best fit. The algorithm of Copy-fm also tests several features in control and test data which would lead to failure of analysis or miscalculation of fm values (Figure 1). To determine the reliability of Copy-fm, we initially employed a series of experimentally generated CNV loss models for X chromosome with varying fm values using DNA samples from a mother-son pair. The fact that sons and mothers share a common X chromosome, and sons naturally have full deletion of a copy of chromosome X allows for preparation of any desired copy number value between 1 and 2 for X chromosome by mixing two samples. Using these models, we tested microarray data obtained from two commercially available platforms (Affymetrix CytoScan Optima Array and Illumina Infinium Human CytoSNP-12 v2.1 BeadChip) to compare real fm values with those determined by our algorithm. Copy-fm was able to call fm values for all of 11 Illumina-array-generated experimental models within a margin of 5% uncertainty (maximum deviation was 4.4%). However, the success of Copy-fm was lower for Affymetrix array generated data which predicted 7 of 11 experimental models within a margin of 5% uncertainty (maximum deviation was 9.3%) (Figure 1). The error margins were higher for lower fm values in both platforms. Furthermore, we tested Copy-fm using microarray data from real clinical peripheral blood samples that belongs to an individual with myelodysplastic syndrome containing a subpopulation of blood cells with two distinct and colocalizing mCNV loss regions on different chromosomes (chr5: 142,310,899–154,530,330 and chr12: 91,865,761–95,215,021). Affymetrix Optima data obtained at different time intervals revealed similar fm values for chromosome 5 (56.3%, 59.8%, 61.3%, respectively) and chromosome 12 (58.2%, 59.8%, 61.7%, respectively). As expected, the two distinct mCNVs had similar fm values at each time point (Figure 1). These results demonstrate that Copy-fm is successful in determining fm for loss mCNVs both in experimentally set mCNV models and clinical data within acceptable margins of uncertainty. In addition, fm values from example Affymetrix data sets (https://www.thermofisher.com/tr/en/home/life-science/microarray-analysis/microarray-data-analysis/microarrayanalysis-sample-data.html) for mCNV gain and loss are in agreement with those calculated by Copy-fm (Figure 1). Minimum sum of residuals has previously been used in a similar approach for calculating fm (PMID: 22277120), but usage of KS test for Copy-fm additionally provides confidence intervals for better evaluation and comparison. Without any user interference Copy-fm is able to consider loss-of-heterozygosity (LOH) and germline CNV status of a genomic region under evaluation which may lead to erroneous fm calculations. This provides an invaluable improvement for fm calculations as LOH regions in both control and test data are more common in highly inbred populations like Turkey. The approach put forward by Copy-fm to calculate fm for mCNVs is platform independent and can easily be adapted for next generation sequencing data. With adjustments, it can be utilized for genome-wide screening of mCNVs and calculating fm for uniparenteral disomy mosaicisms. Especially in cancer genetics, fm of frequently encountered mCNVs calculated by specialized locus-specific methods are being widely employed. With Copy-fm we offer a method for harnessing the mosaicism information from less frequent mCNVs encountered in microarrays, which can be utilized for uncovering unknown mCNVs, monitoring disease progression. This work was supported by TÜBİTAK (319S062) within the RiboEurope consortium and Hacettepe University Scientific Research Projects Coordination Unit (THD-2021-19532).

Suggestions

Correlation distribution of a sequence family generalizing some sequences of trachtenberg
Özbudak, Ferruh (2021-08-01)
In this paper, we give a classification of a sequence family, over arbitrary characteristic, adding linear trace terms to the function g(x) = Tr(x(d)), where d = p(2k) - p(k) + 1, first introduced by Trachtenberg. The family has p(n) + 1 cyclically distinct sequences with period p(n) - 1. We compute the exact correlation distribution of the function g(x) with linear m-sequences and amongst themselves. The cross-correlation values are obtained as C-i,C-j(tau) is an element of {-1, -1 +/- p(n+e/2), -1 + p(n)}.
Mutation classes of finite type cluster algebras with principal coefficients
Seven, Ahmet İrfan (Elsevier BV, 2013-06-15)
Cluster algebras of finite type is a fundamental class of algebras whose classification is identical to the famous Cartan Killing classification. More recently, Fomin and Zelevinslcy introduced another central notion of cluster algebras with principal coefficients. These algebras are determined combinatorially by mutation classes of certain rectangular matrices. It was conjectured, by Fomin and Zelevinsky, that finite type cluster algebras with principal coefficients are characterized by the mutation classe...
CLUSTER ALGEBRAS AND SEMIPOSITIVE SYMMETRIZABLE MATRICES
Seven, Ahmet İrfan (American Mathematical Society (AMS), 2011-05-01)
There is a particular analogy between combinatorial aspects of cluster algebras and Kac-Moody algebras: roughly speaking, cluster algebras are associated with skew-symmetrizable matrices while Kac-Moody algebras correspond to (symmetrizable) generalized Cartan matrices. Both classes of algebras and the associated matrices have the same classification of finite type objects by the well-known Cartan-Killing types. In this paper, we study an extension of this correspondence to the affine type. In particular, w...
Factorization of some polynomials over finite local commutative rings and applications to certain self-dual and LCD codes
Koese, Seyda; Özbudak, Ferruh (2022-03-01)
We determine the unique factorization of some polynomials over a finite local commutative ring with identity explicitly. This solves and generalizes the main conjecture of Qian, Shi and Sole in [13]. We also give some applications to enumeration of certain generalized double circulant self-dual and linear complementary dual (LCD) codes over some finite rings together with an application in asymptotic coding theory.
Randomness properties of some vector sequences generated by multivariate polynomial iterations
Gürkan Balıkçıoğlu, Pınar; Diker Yücel, Melek; Department of Cryptography (2016)
We examine the randomness properties of the sequences generated by the multivariate polynomial iterations method proposed by Ostafe and Shparlinski, by using the six different choices of polynomials given by the same authors. Our analysis is based on two approaches: distributions of the periods and linear complexities of the produced vector sequences. We define the efficiency parameters, PE for “period efficiency” and LCE for “linear complexity efficiency”, so that the actual values of the period and linear com...
Citation Formats
M. Acun and A. Çetinkaya, “Copy-fm: A tool for determination of the fraction of mosaicism in copy number variations,” Erdemli, Mersin, TÜRKİYE, 2022, p. 2038, Accessed: 00, 2023. [Online]. Available: https://hibit2022.ims.metu.edu.tr/.