The diagnostic indicators of gestational diabetes mellitus from second trimester to birth: a systematic review

Background Gestational diabetes mellitus (GDM) is glucose intolerance first recognised during pregnancy. Both modalities and thresholds of the GDM diagnostic test, the Oral Glucose Tolerance Test (OGTT), have varied widely over time and among countries. Additionally, OGTT limitations include inconsistency, poor patient tolerability, and questionable diagnostic reliability. Many biological parameters have been reported to be modified by GDM and could potentially be used as diagnostic indicators. This study aimed to 1) systematically explore biomarkers reported in the literature as differentiating GDM from healthy pregnancies 2) screen those indicators assessed against OGTT to propose OGTT alternatives. Main body A systematic review of GDM diagnostic indicators was performed according to PRISMA guidelines (PROSPERO registration CRD42020145499). Inclusion criteria were full-text, comprehensible English-language articles published January 2009-January 2021, where a biomarker (from blood, ultrasound, amniotic fluid, placenta) was compared between GDM and normal glucose tolerance (NGT) women from the second trimester onward to immediately postpartum. GDM diagnostic method had to be clearly specified, and the number of patients per study higher than 30 in total or 15 per group. Results were synthesised by biomarkers. Results Of 13,133 studies identified in initial screening, 174 studies (135,801 participants) were included. One hundred and twenty-nine studies described blood analytes, one amniotic fluid analytes, 27 ultrasound features, 17 post-natal features. Among the biomarkers evaluated in exploratory studies, Adiponectin, AFABP, Betatrophin, CRP, Cystatin-C, Delta-Neutrophil Index, GGT, TNF-A were those demonstrating statistically and clinically significant differences in substantial cohorts of patients (> 500). Regarding biomarkers assessed versus OGTT (i.e. potential OGTT alternatives) most promising were Leptin > 48.5 ng/ml, Ficolin3/adiponectin ratio ≥ 1.06, Chemerin/FABP > 0.71, and Ultrasound Gestational Diabetes Score > 4. These all demonstrated sensitivity and specificity > 80% in adequate sample sizes (> / = 100). Conclusions Numerous biomarkers may differentiate GDM from normoglycaemic pregnancy. Given the limitations of the OGTT and the lack of a gold standard for GDM diagnosis, advanced phase studies are needed to triangulate the most promising biomarkers. Further studies are also recommended to assess the sensitivity and specificity of promising biomarkers not yet assessed against OGTT. Trial registration PROSPERO registration number CRD42020145499. Supplementary Information The online version contains supplementary material available at 10.1186/s40842-021-00126-7.


Background
In Gestational Diabetes Mellitus (GDM) the pregnancyrelated physiological impairment of glycaemic control and insulin resistance are such that the mother, and consequently the fetus, are exposed to glycaemic levels considered diagnostic of diabetes [1]. GDM is defined internationally as "Hyperglycaemia first recognized during pregnancy" [2], refined in 2015 by the American Diabetes Association (ADA) as "diabetes diagnosed in the second and third trimesters of pregnancy" [3]. Methods and thresholds to identify GDM in pregnancy have changed several times in the last 50 years; currently the most common is the oral glucose tolerance test (OGTT), where 75 g of glucose are ingested by women after an overnight fast and Blood Glucose Level (BGL) is checked at zero, one and two hours after ingestion [4]. Most commonly this is performed at 24-28 weeks gestation, or in case of high-risk patients at 12-16 weeks and again at 24-28 weeks if the initial test is normal.
Initially, GDM was only diagnosed at glycaemic levels that would be considered diagnostic of Type 2 diabetes mellitus in non-pregnant adults. Subsequent studies, including the Australian Carbohydrate Intolerance Study in Pregnant Women (ACHOIS) and the Hyperglycemia and Adverse Pregnancy Outcome (HAPO) study, have demonstrated that lower levels of glycaemia may still be associated with adverse maternal or fetal outcome [5,6], so GDM diagnostic thresholds have been progressively lowered. The current most used criteria worldwide are those released by IADPSG (The International Association of Diabetes and Pregnancy Study Groups): Fasting: 92 mg/dL (5.1 mmol/L), one hour: 180 mg/dL (10.0 mmol/L), two hours: 153 mg/dL (8.5 mmol/L) [2].
The reliability of OGTT has been questioned, as it involves a supra-physiological load unrelated to body weight or normal dietary intake. As well as being unpleasant, expensive and time consuming, OGTT has poor reproducibility: up to 30% of patients with positive screening results screen negative when re-tested [7][8][9][10][11].
While the deficiencies of the OGTT are clear, any replacement is hampered by the lack of a true gold standard for GDM diagnosis. Simply comparing new methods against the poorly reliable OGTT will fail to uncover false-positive and false-negative screening misclassifications. We undertook a systematic review of all the biochemical, clinical and pathological parameters proposed to be altered by GDM in order to aid with identification of a biomarker that accurately differentiates between women who do and do not develop GDM.
The objectives of this study were therefore: 1) To systematically review the literature on biomarkers assessed for their ability to differentiate GDM from NGT pregnancies. 2) To describe characteristics, methodological quality, and findings of studies assessing biomarkers for their predictive accuracy versus OGTT, thereby identifying the most promising to use to effectively discriminate between GDM and NGT pregnancy as compared to the current diagnostic method of OGTT.

Methods
We conducted a systematic review of biomarkers of Gestational Diabetes in accordance with PRISMA guidelines (Additional file 2). Table 1) Full-text articles (conference abstracts only excluded), written in comprehensible English and published January 2009-January 2021. Only randomized controlled trial (RCT), case-control or cohort studies (retrospective or prospective) were accepted, excluding systematic reviews, case report studies, letters. They had to describe biomarkers (blood, ultrasound, placenta/umbilical cord) measured from the second trimester (14 weeks gestation) of human pregnancy to the delivery period (within one-hour post-partum), and values described in GDM women versus NGT, and/or within subgroups of the GDM patients (e.g. diet treated only vs medication). Authors must have specified the method and thresholds used to diagnose GDM and the number of patients included in the study had to be higher than 30 in total or 15 per subgroup. Articles reporting genetic tests (e.g. methylation or miRNA expression) were excluded, as our aim was to identify an inexpensive biomarker that could accurately diagnose GDM worldwide.

Eligibility criteria (PICOS details summarised in
Additional inclusion criteria For the second aim of our study, the indicators must have been compared to OGTT, reporting at least sensitivity and specificity. most promising biomarkers. Further studies are also recommended to assess the sensitivity and specificity of promising biomarkers not yet assessed against OGTT.

Data sources, search strategy and additional articles identification
The initial search was run on the 15/05/2019, and a final/updated search was performed on the 07/02/2021. Six databases were screened: EBM, Medline, Embase, Cochrane, Web of Science, Scopus using keywords "Gestational diabet*" or "pregnancy diabetes" or "GDM" AND "marker" or "biomarker" or "diagnos*" or "indicator". The time limit was set for publications to be from 01.01.2009 to focus on recent evidence. The references of 20% of the included articles were reviewed to confirm that our search identified the majority of relevant articles.

Screening and data extraction
Initial screening of the titles and abstracts of articles returned by the database search was performed by DDF and TW. The screening for the update was performed by DDF, TW, DW, AM. Papers for potential inclusion were then read in full by DDF and a second co-author (TW, DW, JY, AM) to check eligibility. Disputes regarding article inclusion were resolved by joint senior author AH. Once considered eligible, articles were downloaded and read before re-checking eligibility with an inclusion criteria checklist: if included, a Data Extraction Form (Additional file 3) was completed together with a CASP (Critical Appraisals Skill Programme) form and CASP table for quality assessment. Each article was double reviewed. Quality assessment of included articles was conducted by DDF and a second reviewer using the CASP checklists available for each type of article: Diagnostic, RCT, cohort and case-control [12]. Articles insufficiently fulfilling CASP criteria (< 7/11 for casecontrol and RCT and < 8/12 for diagnostic studies and cohort studies) were excluded.
Data collection process and items Data were sought about authors' names, country of study, aim of the study, inclusion/exclusion criteria for participants, method used to diagnose GDM, gestational age at collection, number of cases/controls, settings (ambulatory/delivery room), markers studied, research design (case-control, cohort, diagnostic, randomised control trial), methods, summary of findings, conclusions.

Risk of bias in individual and across studies
We assessed the risk of bias in individual studies by using the CASP checklists appropriate to the type of studies included: case-control, cohort, RCT, diagnostic; the latter reporting the sensitivity and specificity of the indicator as assessed against OGTT.
Summary measure and additional analysis If available, a risk ratio or difference in mean was reported. When described, the sensitivity and specificity of the biomarker assessed against OGTT, and if possible, the number of OGTT that could be avoided by using the indicators were noted. For articles reporting the assessment of a biomarker against OGTT, we used the CASP for diagnostic study checklist. Given the heterogeneous nature of the studies included in terms of GDM diagnostic criteria, definitions of cases and controls, biomarker used, and time in pregnancy of sample collection, a formal meta-analysis was not appropriate, so results are  [13]. Studies assessing only a difference of values between GDM and NGT patients were considered as "Exploratory phase" (defined as substantial if cohort > 500 patients). Studies reporting the biomarker's sensitivity and/or specificity compared to OGTT were considered as "Challenge phase" (small when < 100 patients' sample and adequate when > 100 patients). The results of the challenge phase were considered good with sensitivity and specificity > 80% and very good with sensitivity and specificity > 90%. Based on the phase identified for each article, we then aimed to propose the next phase for further assessing each biomarker: "Challenge phase" to test them against OGTT or "Advanced phase" to confirm and further explore the results of the challenge phase studies.

Characteristics of included studies
Of the 13,133 titles and abstract examined (22 of which were identified through reference list searches), 634 fulltext articles were assessed, and 174 articles including 135,801 participants were included (Fig. 1). The most common reason for exclusion was assessment of biomarkers in the first trimester or later than 1-h post-partum. The number of participants in each study ranged from 35 [14] to 4926 [15].
Regarding methodology, 40 included articles described cohort studies (31 prospective and nine retrospective) and 134 case-control studies. Thirty-five included articles were considered diagnostic as they assessed diagnostic potential of the biomarkers against OGTT, specifying a cut-off. Regarding study location, most studies were conducted in Turkey (45) and China (43) (Fig. 2).
Diagnostic criteria for GDM were heterogeneous; most studies used the 2010 IADPSG criteria [2] adopted by the ADA in 2011 and the World Health Organization (WHO) in 2013 (Table 2). These criteria were used in 100 studies (of which 3 used 50 g Glucose Challenge Test (GCT) as an initial screening test). The second most common criteria were the Carpenter and Coustan's criteria (C&C) [16], released in 1982 and adopted by ADA until 2011, used in articles written up to 2020. The remaining articles followed the WHO criteria [17], the National Diabetes Data Group (NDDG) criteria [18] or local guidelines.
Among the 174 included articles, 129 described maternal blood analytes, one reported amniotic fluid analytes, 27 described ultrasound features of the mothers, the fetuses, the placenta/umbilical cord or a combination of ultrasound features, and 17 assessed postnatal features of the babies and the placentas.

Synthesis of results
The results of individual studies separated into the different types of biomarkers are detailed in Additional file 1, reported as GDM vs NGT and with significance at a p value < 0.05, unless otherwise stated.
Two articles calculated a combination of factors/haematological ratios: the first [81] (n = 792) described HbA1c and hs-CRP being higher and SHBG lower in women who developed GDM; in the second [82] (n = 100) HBA1c and CRP were higher and SHBG and PAPP-A lower in GDM.
Amniotic fluid biomarkers (Additional file 1b) There were two groups of amniotic fluid biomarkers described by Melekoglu et al. [105] (n = 40) with increased levels of ADAMTS4 and ADAMTS5 in GDM (Table 3b). These are markers of alterations in the extracellular matrix and abnormal placentation in response to the increase of inflammatory mediators such as IL-6 and TNF-a.
Ultrasound biomarkers (Additional file 1c) There were 4 types of ultrasound biomarkers: maternal, fetal, annexes and combined. In the maternal section, the epicardial fat thickness of both mothers and babies was higher in GDM in the study of D'ambrosi (n = 168) [106] and Yavuz et al. [107] and by Nar et al. [51] (n = 209). An increase in mean subcutaneous adipose thickness was found in GDM by both D'ambrosi [106] and Kansu-Celik [108]. Among the cardiovascular features, Tosun et al. [109] report significant differences in the superior mesenteric artery doppler systolic/diastolic ratio and the resistance index in GDM women, both increased. Isovo In the fetal section, asymmetrical macrosomia, as well as increased fetal liver volume were reported to be more frequent by Ilhan (n = 97) [111]. Fetal abdominal wall thickness was increased in three studies (n = 490) [14,112,113], with one also finding increased maximum subcutaneous fat tissue thickness at the head circumference and thoracic spine levels [14]. A retrospective cohort study on 44,179 women, found no differences in terms of Head Circumference, Femur Length and Estimated Fetal Weight in pregnancy with and without GDM [114]. In the annexes section, To et al. [116] reported the diameter and the mean flow volume of the UV to differ in GDM (n = 78, 8.23 vs 2.29 mm, p = 0.001 and 8.16 vs 7.54 cm/s, p = 0.03, respectively). Lastly, among the combined biomarkers, Perovic [117] (n = 110) proposed an Ultrasound Gestational Diabetes Score (UGDS) based on the combination of maternal, fetal and annexes features, that was increased in GDM. Neonatal, umbilical cord and placental biomarkers (Additional file 1d) Cord blood Estradiol [118] (n = 408, 44.1 vs 49.9 nmol/L, p = 0.032), as well as Adropin [53] (n = 60, 1.5 vs 3.3 ng/mL, p < 0.001), were reported to be significantly lower. The levels of C-Peptide, Glucose levels and Neopterin were found to be higher in newborns of women with GDM by Ipekci [88]. Among the placental inflammatory markers, CD163 and Iron were reported to be higher as was Cyclophilin-A [119] (n = 43). Placental weight was significantly higher in GDM in three studies [61,120,121], with Kukuc et al. [121] also reporting increased Placenta Weight/Birth Weight ratio, whereas Pooransari reported no significant differences [122]. Dairi et al. [123] and Kadivar et al. (n = 306) [124] demonstrated altered placental villous histological morphology in GDM and meconium-laden macrophages were found in greater concentration by Barke et al. [125].

Additional analysis
Challenge phase studies A total of 35 studies (n = 61,949) assessed biomarkers for the ability to predict OGTT results (Table 3): 30 haematological, of which two used multiparametric prediction models, and five ultrasound features, of which one used ultrasound multiparametric modelling.
OGTT avoidability Four articles reported the number of OGTT potentially avoidable by using the biomarker described in their studies as the screening test (Table 4). Two articles assessed fasting capillary/plasma glucose [15,31] and two glycated haemoglobin [34,36]. The number of OGTT avoidable was calculated as a sum of the number of patients having values below the screening and above the diagnostic thresholds for each biomarker. These thresholds were identified with ROC analysis: the diagnostic threshold was set with specificity between 100% [15,31,36] and 97.2% [34], with the screening threshold sensitivity between 26.4% [36] and 96.9% [15]. The avoidable OGTT ranged from 38% [36]

Summary of evidence
We identified a diverse range of biomarkers differing between GDM and NGT pregnancies, in maternal/cord blood, amniotic fluid and placental samples, as well as at ultrasound examination. Many of the included studies, though, despite reporting statistical significance, only found very small absolute differences between GDM and control, reducing the potential clinical utility of these biomarkers as stand-alone diagnostic markers. There were few biomarkers that differed to a statistically significant and clinically meaningful extent between GDM and NGT. Further research to explore their potential utility as replacements for the OGTT, whether alone or in combination with other biomarkers, is warranted. The most common biomarkers evaluated were haematological. Among these, HbA1c and FBG were assessed in the largest sample sizes and for their ability to avoid OGTT; though neither have been shown to fully substitute for OGTT. A previous systematic review on the use of HbA1c for the diagnosis of GDM found overall high specificity but low sensitivity concluding that "HbA1c should only be used in association with other standard diagnostic tests for GDM diagnosis" [134].
Among the almost 150 biomarkers evaluated in Challenge-phase studies, Leptin [20], Ficolin3/adiponectin ratio [52] and Chemerin/FABP [135] had promising results, yielding very good sensitivity and specificity (> 90%) in adequate sample sizes (= / > 100). Haematological biomarkers demonstrating very good sensitivity and specificity, needing to be confirmed in larger cohorts of at least 100 patients to assess potential substitution for OGTT, are Sex hormone binding globulin (SHBG) and metabolomic profiling for phospholipids, though the latter may be too expensive to be used as a screening or diagnostic test for GDM. Finally, challenge studies are needed to test the sensitivity/ specificity of all the haematological biomarkers reported to be significantly different in GDM. Among those, Adiponectin, AFABP, Betatrophin, CRP, Cystatin-C, Delta-Neutrophil Index, GGT, TNF-A were those demonstrating statistically and potentially clinically significant differences in substantial cohorts of patients (> 500).
Amniotic fluid biomarkers clearly have a limited utility as they are only justifiable in those otherwise requiring interventional sampling. Among the several ultrasound features described as differing in GDM, fetal subcutaneous fat thickness (FSFT) and the UGDS score have been assessed in challenge-phase studies demonstrating promising results. FSFT needs to be confirmed in a cohort of at least 100 women [13], whereas UGDS score can be evaluated in Advanced-phase studies [117]. Post-partum analysis of fetal blood analytes confirmed the higher adipogenic environment found in GDM women as well as the hormonal imbalance in terms of insulin resistance, though none of these biomarkers was investigated in a large cohort and clearly have no prospective utility. The same can be said for placental histomorphological alterations, though either of these markers could potentially be used to correlate between OGTT, alternative screening tests and eventual outcome with regard 'true' GDM in advanced-phase studies.
Whilst several biomarkers show differentiation between GDM and NGT pregnancies, practicalities and translatability need to be taken into account along with sensitivity and specificity, as recommended by WHO ASSURED criteria: a biomarker should be affordable, sensitive, specific, user friendly, rapid and robust, equipment-free and deliverable to end-users [136]. None of the two biomarkers assessed as a potential screening test, namely FBG and HbA1c, could fully replace OGTT. Furthermore, certain haematological biomarkers (especially metabolomics) could be too expensive, time-consuming or require invasive amniotic fluid assessment. Ultrasound markers could represent a good trade-off between cost and acceptability/ feasibility, provided the assessment technique is easy and standardisable.
The lack of a gold standard to confidently identify GDM represents a limitation to any study assessing a new diagnostic tool, as most of the new tools are judged against the existing imperfect screening test (the OGTT). There is also a lack of consensus for GDM diagnostic criteria, with some articles authored in 2019-2020 still using NDDG criteria from 41 years previously [137,138]. Screening failures with the OGTT are well documented along with the potential for false positive and false negative results [139,140]. Data triangulation could represent a solution to this limitation, a process described as "the application of (at least) two different methods aimed at one particular problem" [141]. The results of OGTT could be combined with those of an alternative diagnostic method as well as with risk factors and outcomes of GDM, potentially including post-natal biomarkers such as placenta histomorphology.
Whilst many biomarkers presented in this review are not suitable as stand-alone markers, they could potentially be included in a multi-modal/triangulated evaluation of OGTT positive and negative patients. This advanced-phase analysis might allow a new and more complete understanding of detection and definition of GDM. As per Hackfort et al. [142], data triangulation is "relating different data or sources of data in such a way that will result in a new picture of the object, a different construction of the object, and a new idea of the object". Chikere and Wilson recently reviewed diagnostic test evaluation methodology in the absence of gold standard with multiple imperfect reference standards used, identifying Discrepancy Analysis (DA) and Latent Class Analysis (LCA) to be the most suitable methodology [143]. DA "compares the index test with an imperfect reference standard: participants with discordant results undergo another imperfect test, called the resolver test, to ascertain their disease status". "To avoid biased estimates, some of the participants with concordant responses (true positives and true negatives) can be sampled to undertake the resolver test alongside participants with discordant responses (false negative-FN and false positive-FP)". In comparison, using LCA, the test performance of all the tests employed in the study are evaluated simultaneously using probabilistic models with the basic assumption that the disease status is latent (frequentist LCAs) or unobserved (Bayesian LCAs). DA and LCA could represent the best way to triangulate the most promising biomarkers of GDM in advanced-phase studies.

Strengths and limitations
To the best of our knowledge, this is the first systematic review on GDM biomarkers, reporting the results of 174 articles for almost 136,000 participants, reviewed according to PRISMA checklist and assessed using CASP criteria for quality of included publications, setting up a CASP checklist threshold to exclude studies not fulfilling multiple criteria thereby reducing the risk of bias. Incomplete retrieval of identified research was minimised by screening the references of 20% of the included articles; identifying 22 additional articles.
The heterogeneity of methods used in the included articles to diagnose GDM and to assess the biomarkers is a limitation that precluded meta-analysis and allowed only narrative synthesis. The preponderance of studies from just two countries with specific ethnic backgrounds (China and Turkey) may limit the external validity of the findings.

Conclusions
Whilst multiple biomarkers may show differences between GDM and non-GDM pregnancies, few of these differences were of sufficient absolute size or of a nature to be clinically useful. The most promising biomarkers for detection of GDM were: Leptin, Ficolin -3/Adiponectin and Chemerin/FABP among the haematological biomarkers and UGDS score at ultrasound examination. No single feature currently performs sufficiently well to be an adequate screening test for GDM.