Assessment of practical applicability and clinical relevance of a commonly used LDL-C polygenic score in patients with severe hypercholesterolemia

Background and aims: Low-density lipoprotein cholesterol (LDL-C) levels vary in patients with familial hypercholesterolemia (FH) and can be explained by a single deleterious genetic variant or by the aggregate effect of multiple, common small-effect variants that can be captured in a polygenic score (PS). We set out to investigate the contribution of a previously published PS to the inter-individual LDL-C variation and coronary artery disease (CAD) risk in patients with a clinical FH phenotype. Methods: First, in a cohort of 628 patients referred for genetic FH testing, we evaluated the distribution of a PS for LDL-C comprising 12 genetic variants. Next, we determined its association with coronary artery disease (CAD) risk using UK Biobank data. Results: The mean PS was higher in 533 FH-variant-negative patients (FH/M-) compared with 95 FH-variant carriers (1.02 vs 0.94, p < 0.001). 39% of all patients had a PS equal to the top 20% from a population-based reference cohort and these patients were less likely to carry an FH variant (OR 0.22, 95% CI 0.10–0.48) compared with patients in the lowest 20%. In UK Biobank data, the PS explained 7.4% of variance in LDL-C levels and was associated with incident CAD. Addition of PS to a prediction model using age and sex and LDL-C did not increase the c-statistic for predicting CAD risk. Conclusions: This 12-variant PS was higher in FH/Mpatients and associated with incident CAD in UK Biobank data. However, the PS did not improve predictive accuracy when added to the readily available characteristics age, sex and LDL-C, suggesting limited discriminative value for CAD.


Introduction
High plasma levels of low-density lipoprotein cholesterol (LDL-C) have been shown to cause atherosclerotic cardiovascular disease (CVD) [1]; a leading cause of morbidity and mortality [2]. Genetic susceptibility to severe hypercholesterolemia (generally defined as LDL-C >4.9 mmol/) may be caused by rare pathogenic variants ("monogenic") or by the aggregate effect of multiple, common small-effect variants ("polygenic"). Patients with familial hypercholesterolemia (FH) are characterized by very high LDL-C levels from birth, which can be caused by a monogenic deleterious variant in LDLR, APOB or PCSK9, and those monogenic FH patients have been shown to have an at least 2.5-fold higher CVD risk compared with individuals with comparable LDL-C levels who do not carry an FH-variant [3]. However, in many patients who present with an FH phenotype no causal monogenic cause is identified [4,5]. In these patients, the aggregate effect of common LDL-C raising variants, captured in a polygenic score (PS), may underlie the observed FH phenotype. In 2013, Talmud and colleagues showed that a PS for LDL-C comprising 12 common genetic variants was associated with an FH phenotype [6]. The relatively low number of variants and the consistent association with LDL-C levels make this PS an inexpensive and easy-to-implement addition to regular FH sequencing methods to identify patients with 'polygenic hypercholesterolemia'. This original PS (or variations with fewer variants) is used in clinical research settings [7][8][9][10][11][12][13][14][15][16], but its exact relevance and translation to clinically actionable advice remains a matter of debate [17].
Therefore, we assessed the distribution of this LDL-C PS in a hypercholesterolaemic cohort from a national referral centre for genetic FH diagnostics and subsequently investigated its effect on CAD risk in individuals from the UK Biobank. We sought to validate 1) the difference in PS between monogenic FH patients and patients with phenotypic FH but without a monogenic variant who were referred for genetic analysis and 2) to investigate the association of this PS with CVD in the general population to determine its added predictive value in a prognostic model over and above readily available, non-genetic, clinical data.

Patients referred for genetic FH testing
To investigate the distribution of the PS in patients with severe hypercholesterolemia, we included adult index patients who were referred for genetic FH testing in Amsterdam UMC (AUMC), location Academic Medical Center in Amsterdam, the Netherlands, between March and September 2019.
Patients <18 years of age, referred for genetic dyslipidaemia other than FH, as well as those with LDL-C below 5 mmol/L (threshold for Dutch Lipid Clinical Network [DLCN] score of 'possible FH') were excluded. Patients with LDL-C levels obtained while on lipid-lowering therapy or those with severely elevated levels of triglycerides (>4.5 mmol/L) were excluded to prevent enrolling patients with potentially inaccurate LDL-C values. Clinical data were collected through a standardized questionnaire filled out by the referring physician and a modified DLCN score was calculated with the available data, as described previously [5]. Patients undergo evaluation of secondary causes for hyperlipidaemia by their local physician before referral for genetic testing. As such, we could not further evaluate potential secondary causes because centralized data was unavailable. All patients provided written informed consent and the institutional review board of the AUMC provided a waiver for the reuse of genetic data for research purposes. Molecular methods have been described previously [5]. Briefly, DNA was isolated from blood samples and processed using an in-house next-generation sequencing (NGS) capture covering 27 lipid genes and the 12 genetic variants used to calculate the PS. (SeqCap easy choice, Roche NimbleGen Inc., Pleasanton, USA). Patients carrying class 4 or 5 variants or copy number variants in LDLR, APOB and PCSK9, as classified by the standard ACMG guidelines (Supplementary Table 1) [18], were diagnosed as having monogenic FH (FH/M+) and all other patients were considered not to have monogenic FH (FH/M-). Patients carrying bi-allelic FH variants, bi-allelic variants in LDLRAP1 (causing autosomal recessive FH) or bi-allelic variants in ABCG8 and/or ABCG5 (causing sitosterolemia) were excluded from the analysis.

UK Biobank cohort
Individual participant data from the UK Biobank was used to assess whether and to which extent the PS was associated with CAD risk. The UK Biobank is a population study which enrolled participants between 2006 and 2010 from 22 centers across the United Kingdom. The study characteristics have been published elsewhere [19]. We included participants who self-identified as being from European descent, and for whom LDL-C levels and all genotyping data were available that passed quality control as detailed by the UK Biobank. All participants provided written informed consent. We did not exclude patients with secondary causes of hypercholesterolemia.
CAD was defined as the presence of a major coronary event: a composite of coronary death, nonfatal myocardial infarction or coronary revascularization, based on hospital admission for either of these components as defined by their ICD codes, procedure codes and death registries or by self-report. A full list of these codes is provided in Supplementary Table 2. We defined both lifetime CAD (a composite of incident and prevalence CAD) and incident CAD (a CAD event after the baseline visit), to assess to the lifetime effect of a PS and the effect of the PS on incident CAD after baseline. BMI was measured and a lipid panel was obtained at enrolment in the study at a UK Biobank assessment centre. To account for lipid-lowering therapy, we multiplied the LDL-C of participants being treated with lipid-lowering therapy by 1.43, as previously described [20][21][22]. Genotyping of the UK Biobank cohort was performed using the Affymetrix UK Biobank Axiom array and the Affymetrix UK BiLEVE Axiom array [19].

Polygenic score determination
For each study participant, we calculated the 12-single nucleotide polymorphism (SNP) PS weighted for the LDL-C effect of each allele as observed in the study by the Global Lipid Genetics Consortium [23]. We grouped participants from the AMC cohort based on PS decile ranges derived from the Whitehall II (WHII) study, a general population cohort serving as the reference cohort in the analyses, in accordance with the study that originally developed this 12-SNP PS [6]. To assess the effect of the PS on coronary artery disease (CAD), we divided participants from the UK biobank into eleven groups, based on the distribution of the PS in the UK Biobank itself (resulting in groups resembling <5%, 5-15%, etc.). This enabled comparison of CAD risk with participants in the 45-55% percentile group as reference, reflecting the average patient.

Statistical analyses
The PS was described by the mean (±SD) and all other continuous variables as median (±IQR). Categorical variables were reported as absolute count (%). From the AUMC cohort, subjects with a pathogenic FH variant were compared to patients without a pathogenic variant using Wilcoxon tests for continuous variables and using Fisher's exact test for categorical data.
To determine the effect of the LDL-C PS, we used linear regression to predict LDL-C levels and logistic regression to predict lifetime CAD. To assess model discrimination between subjects with and without incident CAD, a Harrel's c-statistic was used (where 0.5 is random discrimination, and 1.0 perfect discrimination) [24]. Specifically, the following models were employed to assess CAD risk in the UK Biobank: A) only including the PS (either continuous or as deciles) B) using age and sex, and C) combining the genetic score with age and sex. Finally, we constructed a fourth model D) which also included measured LDL-C level, combined with age, sex and the PS. We performed two analyses, one where we investigated the effect of the PS on lifetime CAD, and one where we investigated the effect of the PS on incident CAD after the baseline visit. Measured LDL-C level was only used to predict incident CAD. To assess the predictive value of this score in hypercholesterolaemic patients of the UK Biobank, we also stratified participants based on LDL-C levels below and above 4.9 mmol/L and repeated the analyses assessing model discrimination.

Baseline characteristics of AUMC cohort
Between March and September 2019, 1166 patients were referred for genetic testing of FH, of whom 747 had an LDL-C > 4.9 mmol/L upon referral. Excluding patients who were taking lipid-lowering therapy, who had triglyceride levels >4.5 mmol/L or who were found to carry bi-allelic FH-causing variants left 628 patients for the final analysis (Supplementary Fig. 1). A pathogenic FH variant in one of the three FH genes (LDLR, APOB or PCSK9) was found in 95 (15.1%) patients (FH/M+). A total of 74 patients were heterozygous carrier of a variant in LDLR, 17 carried a variant in APOB and 4 in PCSK9. This left 533 (84.5%) subjects in whom no FH-causing variant was identified (FH/M-, Table 1).
The baseline characteristics for the AUMC cohort are listed in Table 1. Overall, 64.0% of patients were female, the median age was 56 years (IQR 49-64) and the median BMI was 26.23 ). The median LDL-C in the entire cohort was 6.19 mmol/L (IQR 5.55-6.90). Patients in the FH/M+ group were generally younger than the FH/M-group (median age 50 vs 57 years, p < 0.001) and more frequently female (72.6 vs 62.5%, p = 0.03). Total cholesterol and LDL-C were both higher in the FH/M+ compared with FH/M-group (median 9.20 (IQR 8.50-10.00) and 7.10 (IQR 6.45-8.00) vs 8.20 (IQR 7.60-9.00) and 6.00 (IQR 5.50-6.70) mmol/L respectively, while triglycerides levels were lower (1.  [6], the FH/M-group showed a skewed distribution and an enrichment of patients with a PS in the upper two PS deciles. This was not observed in the FH/M+ cohort (Fig. 1, left panel and Fig. 2). When dividing patients into quintiles based on the reference range from the WHII study [6], we observed that only among patients with a PS > 80% there was a significantly lower odds for the presence of an FH variant (OR 0.22, 95% CI 0.10-0.48; Fig. 1, right panel). Within the FH/M-cohort, the PS showed negligible association with LDL-C levels (R 2 = 0.574%, p = 0.044).

CAD risk and polygenic hypercholesterolemia
To assess the effect of the PS on lifetime CAD, we included 436,512 participants from the UK biobank. The baseline characteristics for this cohort are depicted in Table 1. 54.3% were women and the median age was 71 years (IQR 63-76). The median LDL-C at inclusion was 3.72 mmol/L (IQR 3.18-4.30) and 75,495 participants (17.3%) were on lipidlowering therapy at the time of the sampling visit. The median LDL-C in the 45-55% group (the reference group for further analyses) was 3.76 mmol/L (IQR 3.25-4.31, for other values see Supplementary Table 3). A total of 35,865 (8.2%) participants had a CAD event in their lifetime. Of these, a first CAD event occurred in 21,170 patients after inclusion and blood sampling.
The mean weight of the 12-SNP PS in the UK Biobank was 0.906, which is similar to the mean of 0.9 described in the original WHII cohort [6]. The PS was significantly associated with LDL-C levels, explaining 7.37% (95% CI 7.22-7.52%) of its observed variance. Compared with the median PS group (45-55%), mean LDL-C levels were 0.66 mmol/L lower in the lowest 5th percentile of PS (95% CI -0.67; − 0.64 mmol/L) and 0.30 mmol/L higher in patients in the highest 5th percentile (95% CI 0.29-0.32 mmol/L; Fig. 3, left panel).
We observed that PS was associated with a higher risk for lifetime CAD (Fig. 3, right panel) (OR 1.10, 95% CI 1.09-1.12 per standard deviation increase in PS score). When depicted per increasing quantile of PS, we observed that participants with a PS < 5% had an OR of 0.79 for CAD compared with participants in the 45-55% percentile group (95% CI 0.74-0.84), while participants with a PS >95% have an OR of 1.16 compared to participants in the 45-55 percentile (95% CI 1.10-1.23; Fig. 3, right panel). The PS was also associated with a higher risk for incident CAD (OR 1.07, 95% CI 1.06-1.09).

Clinical relevance of the PS
To assess how the 12-SNP PS for LDL-C predicted CAD, we calculated the c-statistic for various models ( Fig. 4 and Supplementary Table 4). In predicting lifetime CAD: the c-statistic for a model with only the PS is 0.528 (95% CI 0.525-0.531), whereas the c-statistic for a model with only age and sex is 0.743 (95% CI 0.741-0.746). Dividing the PS into 11 quantiles did not affect the c-statistic (0.528 95% CI 0.525-0.531). Adding the continuous PS to age and sex results in a c-statistic of 0.745 (95% CI 0.742-0.747), similar to adding the grouped PS to age and sex (0.745, 95% CI 0.742-0.747). In predicting incident CAD, the c-statistic for a model with only the PS is 0.520 (95% CI 0.516-0.524), which is less than a model comprising only LDL-C levels (0.555, 95% CI 0.551-0.559). Combining the PS and LDL-C did not change the c-statistic (0.555, 95% CI 0.551-0.559). For incident CAD, the c-statistic for the model with age and sex was 0.718 (95% CI 0.714-0.721), which did not change when adding the PS (0.719, 95% CI 0.715-0.721), and marginally increased when adding LDL-C (0.722, 95% CI 0.718-0.724). When constructing a model including age, sex and LDL-C levels, the cstatistic was 0.722 (95% CI 0.718-0.725). Stratifying the cohort based on LDL-C below or above 4.9 mmol/L showed that the discriminative value of the PS and LDL-C were reduced in the group with LDL-C above 4.9 mmol/L (n = 42,641), while the PS performed similarly in the group with LDL-C below 4.9 mmol/L (n = 393.871), compared to the inclusion of all participants. (Supplementary Fig. 3 and Supplementary Table 4).

Discussion
We investigated the clinical applicability of a small and easy-toimplement LDL-C polygenic score by assessing its distribution in a cohort of severe hypercholesterolemia patients referred for genetic FHtesting, and by examining its value in explaining hypercholesterolemia and predicting CAD in the UK Biobank. In our cohort referred for genetic FH-testing, the PS was higher in FH/M-than in FH/M+ patients and those in the top PS quintile were significantly less likely to carry an FHvariant. In the UK Biobank, we show that this PS explains approximately 7% of variance in LDL-C levels and is associated with increased CAD risk.
However, the PS offered no additional value in predicting lifetime and incident CAD when modelled in combination with age, sex and measured LDL-C level. Taken together, our data suggest that the predictive value of a previously published LDL-C PS comprising a limited number of SNPs is small, and of little clinical relevance in patients referred for genetic testing for FH.
Our result that patients with severe hypercholesterolemia who test negative for an FH variant had an increased PS is in line with previous literature and adds to a growing body of evidence that severe hypercholesterolemia can, at least partly, be attributed to a polygenic origin [4,6,9,28]. Approximately 40% of patients from our cohort referred for  genetic FH testing had a PS equivalent to the top 20% PS from a general population reference cohort. This observation is in line with other studies evaluating polygenic scores for LDL-C in severe hypercholesterolaemic patients, which have shown that patients with an FH phenotype but without a pathogenic variant were 2-4 times more likely to have a 'high PS' (heterogeneously defined between the top 5 to top 20 percentile) compared to reference cohorts from the general population [6,15,28,29].
We show that this PS explains a modest 7.4% of variance in LDL-C levels. Explained variance in other polygenic scores for LDL-C ranged from 5 to 30% and this percentage does not increase linearly with adding more genetic variants. For example, a 223-SNP PS explained 10% [20] while a 1.92 million SNP LDL-C PS explained 29.8% of variance in LDL-C levels [30], although it is uncertain how much of the latter PS interacts with environmental factors. Combined, this suggests that other unknown and/or unmeasured genetic or environmental factors still predominantly explain severe hypercholesterolemia in FH/M-patients with 'high PS'. In other words, a high PS based on 12 SNPs in our study population might only explain a small proportion of the severely elevated LDL-C levels in these clinical FH patients.
In the UK Biobank cohort, we observed that the PS is associated with modestly increased odds for CAD. Participants in the top 5% PS have an OR of 1.16 for CAD (95% CI 1.10-1.23) compared with participants with a PS around the median (i.e., the 45-55% group). This significant association with CAD supports other recent studies that investigated different polygenic scores for LDL-C [20,29,31,32]. However, the increased risk does not nearly match the risk for CAD conferred by monogenic FH, which is reported to be up to four times higher compared to patients with comparable LDL-C level without a monogenic FH-variant [3,20,31,33]. It has previously been shown in the UK Biobank that individuals with polygenic hypercholesterolemia (top 5% of a 223-SNP PS) have a higher CVD risk compared to patients with similar LDL-C levels without a presumed genetic origin, but this increased risk was less pronounced compared with monogenic FH patients (polygenic  hypercholesterolemia: OR 1.29 (95% CI 1.05-1.59); monogenic FH OR 1.93 (95% CI 1.32-2.81), compared to matched controls) [20].
A problem with comparing and interpreting PS-associated CAD-risk reported in other studies is the fact that 'polygenic hypercholesterolemia' is heterogeneously and arbitrary defined as the top 5%, 10% or 20%, and that this group is often compared to other arbitrary groups (e. g., the lowest 5%-50%, or the remainder of the study population) [4,8,20,29,31,32]. This heterogeneity limits comparison between studies, may give an exaggerated representation of PS-associated CAD-risk, and complicates translation into clinical practice. Recently, reporting standards have been formulated to facilitate more uniform reporting of polygenic scores [34]. In line with recent studies [35], we deliberately chose to present CAD risk relative to "the average patient" (45-55% decile group) since a PS follows a normalized distribution in the population based on the frequencies of the variants.
While the PS was associated with increased CAD risk, it added little discriminating value compared to age, sex, and LDL-C levels, both in models predicting lifetime CAD or incident CAD, with or without inclusion of measured LDL-C as predictor. It is of note that restricting the model to incidence CAD after LDL-C measurement resulted in lower cstatistics. Combined, our results that the PS is associated with LDL-C and CAD but does not add discriminating value in the entire population, nor in the hypercholesterolaemic population suggest limited utility of incorporating the PS in risk prediction and therapeutic decision-making. These findings should not be extrapolated to CAD-prediction using other polygenic scores. Previous studies showed that only a minority of genetic variants linked with CAD exert their effects via cholesterol-related pathways [35]. Therefore, with declining costs and improved methods of sequencing [36], more comprehensive genome-wide polygenic scores for CAD (including millions of variants, largely not related to LDL-C) might improve risk prediction and could be incorporated into future clinical practice [17,35,37].
With respect to clinical implications, guidelines advise early and aggressive lipid-lowering to meet low LDL-C targets in FH/M+ patients as well as cascade testing of relatives, but no recommendations have been formulated for patients with polygenic hypercholesterolemia [38,39]. Given that our results show limited PS-associated CAD-risk, we cannot provide firm support for similar treatment recommendations for patients with presumed polygenic hypercholesterolemia (e.g., PS > 95%) based on this 12-SNP PS. With regard to cascade testing, a recent study using genome-wide PS for CAD showed that high polygenic scores do tend to cluster within families [40], suggesting that cascade testing to identify family members with a high polygenic score may become relevant if the PS in question will indeed be shown to predict a clinically relevant increased risk. When such validated polygenic scores become available, screening for a polygenic background could be done side-by-side with screening for a monogenic cause in order to identify families at increased CAD risk irrespective of the genetic origin. Another relevance of incorporating an LDL-C PS may be that the search for a secondary or yet unidentified monogenic cause for severe hypercholesterolemia may be pursued more actively in FH/M-patients with a low PS [9,33,41].
Our study has limitations. First, our analyses in the UK Biobank only included patients from European ancestry and our results may not reflect the impact in other ethnic populations. Ethnic background was not recorded in the AUMC cohort but assumed to be reflective of the Dutch population largely of European descent. Second, we only included patients from the AUMC cohort in the study with an LDL-C > 4.9 mmol/ L, classified as having at least "possible FH" according to the DLCN criteria. Data on other factors in the DLCN score was not uniformly available.
In conclusion, we have externally validated a 12-variant LDL-C PS and confirmed its association with both LDL-C levels and CAD risk in the general population, but the predictive value was less pronounced in the more clinically relevant hypercholesterolemic patients. We show that severe hypercholesterolemia patients without monogenic FH are enriched for high polygenic scores. Despite these findings, this LDL-C PS does not add predictive value to readily available patient characteristics such as sex and age in discriminating between patients who will and those who will not experience a future CAD event. This suggests limited clinical utility of this LDL-C polygenic score in guiding clinical practice.

Financial support
AJC is supported by grants from the AMC young talent fund and the Atheros fund. GKH reports research grants from the Netherlands Organization for Scientific Research (vidi 016.156.445), CardioVascular Research Initiative, and European Union. AFS is supported by BHF grant PG/18/5033837 and the UCL BHF Research Accelerator AA/18/6/ 34223.

Declaration of competing interest
AJC, TRT, JCD and LZ report no conflicts of interest. LFR is cofounder of LipidTools B.V. GKH reports institutional research support from Aegerion, Amgen, AstraZeneca, EliLilly, Genzyme, Ionis, Kowa, Pfizer, Regeneron, Roche, Sanofi, and The Medicines Company; speaker's bureau and consulting fees from Amgen, Aegerion, Sanofi, and Regeneron (fees paid to the academic institution); and part-time employment at Novo Nordisk. ESGS has received fees paid to his institution from Amgen, Akcea, Athera, Sanofi-Regeneron, Esperion, Novo Nordisk, Lily, and Novartis. AFS has received Servier funding for unrelated work.