Skip to main content
Fig. 1 | BMC Pregnancy and Childbirth

Fig. 1

From: Early prediction of preeclampsia from clinical, multi-omics and laboratory data using random forest model

Fig. 1

The schematic workflow of this study. 198 pregnancies were enrolled from 2019 to 2021 in this study, these participants comprise 56 EPE, 50 LPE and 92 healthy controls. Clinical and laboratory data of each participant were retrieved from health information system by physicians. Plasma samples were collected and underwent proteome and metabolome assays according to the manufacturer’s instructions. Protein and metabolite expression were analyzed using the Spectronaut software with the default parameters. Then differentially expressed proteins and metabolites were identified, followed by GO and KEGG pathway enrichment analysis. The total data were split into the training and test sets at a ratio of 3:2, then feature importance analysis was performed on all proteomic and metabolic biomarkers in the training dataset. The top ten most important metabolic or proteomic biomarkers were selected to build 968 predictor combinations (> 2 biomarkers) separately. The training dataset was randomly split into an internal training set (ITS) and internal validation set (IVS) at a ratio of 2:1. For each combination of metabolic or proteomic predictors, a random forest model was built in the ITS and validated in the IVS. The process was repeated 10 times, generating 10 prediction models and their corresponding area under the curves (AUC) values. The feature combination with highest mean AUC value in the IVS was considered as the optimal set of biomarkers for the construction of the prediction models in the training set. Lastly, the final models were established using different combinations of clinical factors, the optimal combination of proteomic, metabolic biomarkers and laboratory test variables in the training dataset, the performances of the established models were independently evaluated in the test dataset

Back to article page