Exploring Computational Techniques in Preprocessing Neonatal Physiological Signals for Detecting Adverse Outcomes: Scoping Review

Background: Computational signal preprocessing is a prerequisite for developing data-driven predictive models for clinical decision support. Thus, identifying the best practices that adhere to clinical principles is critical to ensure transparency and reproducibility to drive clinical adoption. It further fosters reproducible, ethical, and reliable conduct of studies. This procedure is also crucial for setting up a software quality management system to ensure regulatory compliance in developing software as a medical device aimed at early preclinical detection of clinical deterioration. Objective: This scoping review focuses on the neonatal intensive care unit setting and summarizes the state-of-the-art computational methods used for preprocessing neonatal clinical physiological signals; these signals are used for the development of machine learning models to predict the risk of adverse outcomes. Methods: Five databases (PubMed, Web of Science, Scopus, IEEE, and ACM Digital Library) were searched using a combination of keywords and MeSH (Medical Subject Headings) terms. A total of 3585 papers from 2013 to January 2023 were identified based on the defined search terms and inclusion criteria. After removing duplicates, 2994 (83.51%) papers were screened by title and abstract, and 81 (0.03%) were selected for full-text review. Of these, 52 (64%) were eligible for inclusion in the detailed analysis. Results: Of the 52 articles reviewed, 24 (46%) studies focused on diagnostic models, while the remainder (n=28, 54%)


Background
Premature infants are those born at <37 weeks gestational age, ranging from extreme preterm (23 weeks' gestation) to late preterm (37 weeks' gestation), and are defined as having very low birth weight of <1500 g.These extremely premature infants have a higher risk of death, and surviving infants are highly prone to physical, cognitive, and emotional impairment [1].The patients usually have a long length of stay, ranging from <10 to >120 days [2], in the neonatal intensive care unit (NICU), where high-fidelity physiological changes are monitored to observe their health status and signs of deterioration.During this long length of stay, a large amount of data from infants are generated and not typically electronically aggregated for permanent storage [3].With the advent of electronic health records, relevant patient information is easily available for advanced data analytics that can be used to improve health outcomes.The records contain demographics, etiology, pathology, medication, and physiology information.Physiological changes are regularly monitored in preterm infants, notably, electrocardiogram (ECG), oxygen saturation (SpO 2 ), heart rate (HR), respiratory rate, arterial blood pressure, electroencephalography (EEG), and temperature.Some advanced centers around the world have started linking the information derived from the electronic health records data with the continuously monitored physiological information for permanent storage, more frequently in lower resolution, which facilitates various data analytics [4][5][6].Compared with intermittent assessment and review, continuous capturing and analysis of the physiological data from the standard bedside monitors allow for a better understanding of trends and have been shown to improve outcomes of infants in the NICU [5].
Clinical decision support systems (CDSSs) can integrate clinical and physiological information to provide automated support in patient care planning to facilitate the diagnostic process and therapy planning, generate critical alerts and reminders, and predict the risk of patient deterioration.CDSSs have the potential for a positive impact in improving clinical and economic measures in the health care system [7][8][9].The technological advancement that allowed storing big data, as well as the advancement of artificial intelligence (AI), has given rise to machine learning (ML)-and AI-based CDSSs aiming to build data-driven models to predict adverse outcomes in premature infants ahead of clinical diagnosis time [10][11][12].
The steps of building the ML pipeline to predict adverse outcomes involve several intermediate computational steps using the physiological data, of which data preprocessing is the first indispensable step.Namely, in the NICU, physiological signals are collected using a diverse range of devices, which introduce a number of artifacts such as environmental artifacts (eg, device connection failure, equipment noise, electrosurgical noise, and power line interferences); experimental or human error due to patient movement during data acquisition, incorrect or poor contact of the electrodes, and other contact noise; and artifacts due to muscle contraction, cardiac signals, and blinking [13,14].These noises distort signals and may adversely affect model generalization capability and predictive power [10].
Although recently much progress has been made in building ML models using neonatal physiological data, there are limitations in the detailed reporting of the preprocessing techniques of these signals [15], which in turn hinder the reproducibility of the methods and results.In AI-powered software as a medical device (SaMD), this is especially important as the implementation of a software quality management system (QMS) is only possible by following the best practices and adhering to relevant regulatory standards and guidelines for medical devices, such as ISO 13485, IEC 62304, and IEC 82304-1.Beyond market access considerations, the ongoing international discourse on the regulation of medical software is specifically concentrated on AI and ML.This focus is a response to their growing applications, demanding increased attention from regulatory bodies such as the Australian Therapeutic Goods Administration and the US Food and Drug Administration [16].Thus, it is crucial to adhere to a standardized protocol following clinical principles guided by domain experts and regulatory requirements while preprocessing the signals and reporting these techniques in detail; this ensures the reproducibility of the methods, allowing transparency in their clinical adoption.

Objectives
As the first step in bridging the gap in their reproducibility for clinical adoption, this review aims to identify studies that used computational methods to analyze premature infants' physiological signals for detecting adverse outcomes.The review describes different tools and techniques used to preprocess physiological signals and provides recommendations on what aspects need further details for the clinical adoption of the techniques.The remainder of the paper is organized as follows: the Methods section explains the detailed search and screening process, while the Results section begins with an overview of the reviewed studies, followed by a detailed analysis.The Discussion section highlights the key reporting patterns identified in this review along with their shortcomings and provides recommendations for transparent reporting of future studies as it allows for accurate reproduction of the results and makes them usable in the clinical setting [17].A summary of the work concludes the paper.

Screening and Study Selection
The initial screening of the databases led to 3585 papers.Of these, 590 (16.46%) papers were manually identified as duplicates and excluded from the analysis.One paper was identified as a duplicate by the automation tool and removed.The remaining 2994 (83.51%) papers were subjected to title and abstract screening using the Rayyan Intelligent Systematic Review application (Qatar Computing Research Institute) [20].
Several inclusion criteria were set to select papers for full-text review.The criteria are mentioned in Textbox 1.
After screening the titles and abstracts, 81 articles were selected for full-text review; 29 (36%) papers were excluded during this stage as they did not align with the inclusion criteria, leaving 52 (64%) papers eligible for detailed synthesis and analysis.
The title and abstract screening was done by 1 reviewer, while 2 reviewers independently checked for paper eligibility against the inclusion criteria at the full-text review stage.When both reviewers were not in agreement on any papers, a third reviewer assessed them to provide a final decision on the inclusion and exclusion of the papers.Data charting was done using Microsoft Excel, and the following variables were recorded in line with related review papers [10,21]: title, year, journal, authors, digital object identifier, data set, participant number, participant demographic, signals used, data set size, sample rate, other data (if applicable), outcome metric, device software, programming language, preprocessing methods, algorithms, other techniques, features, models, model type, results (quantified), and key findings.Data synthesis was done using a narrative approach by summarizing findings based on the similarities in the data sets and techniques used.The detailed search queries,

RenderX
bibliography files of all databases, all included papers, metadata of all papers and metadata of all papers included for full-text review are provided in Multimedia Appendices 1-5 .

•
Article type: articles must be peer-reviewed publications in a journal, conference, or workshop • Data: articles must conduct an analysis on premature human infant data; articles must use physiological responses in some form • Outcome: articles discuss applications relating to adverse neonatal outcomes such as mortality, length of stay, sepsis, necrotizing enterocolitis, intraventricular hemorrhage, hypoxic-ischemic encephalopathy, apnea, bradycardia, and other poor health outcomes, also known as morbidity.
The disease outcomes were chosen based on the commonly researched outcome metric using preterm infant data and the search terms used in McAdams et al [10] that investigated artificial intelligence and machine learning techniques used to predict clinical outcomes in the neonatal intensive care unit • Analysis: articles reported some form of computational techniques in their analysis

Overview of the Included Studies
Figure 1 shows the full process of database search and study selection using a PRISMA flow diagram.
As the studies were found to be heterogeneous in their study design and analysis techniques, a narrative approach was taken to summarize the studies and their key findings.The studies were grouped according to the homogeneity in terms of the data sets used and sorted by the publication year.This approach was inspired by the review article by Mann et al [78].
One of the noticeable patterns identified through the results reported in Table 2 is that the groups publishing studies using the same data set followed similar preprocessing techniques, although not at every step.For instance, studies using the ECG data from Cork University Maternity Hospital all used the same algorithm for QRS complex detection.However, they were diverse in their selection of filtering techniques and segmentation duration.Furthermore, they systematically failed to report detailed parameter settings for the QRS complex detection.While the approach of using similar preprocessing techniques helps maintain consistency to some extent, they do not confirm adhering to clinical practices identified from domain expert knowledge.
The QRS complex characteristics and RR intervals for neonates are different from those of adults and as such require an appropriate adjustment for QRS detection algorithms.This is a necessary first step for HR variability (HRV) analysis in neonates.However, a review published on neonatal HRV by Latremouille et al [15] revealed that given a lack of clear guidelines on neonatal vital signs and HRV analysis, several studies followed HRV analysis guidelines for adults published by the Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology [79].Our review found that 16 (44%) out of the 36 studies analyzing ECG signals used the Pan-Tompkins algorithms for QRS complex detection.The original implementation of the algorithm was based on the ECG characteristics of the adult population and therefore was preprocessed accordingly.Only 4 (25%) of those 16 studies reported adjustment of the original algorithm to adapt to neonates, of which only 2 provided specific modification details.In the absence of detailed reporting on the parameter settings, it is difficult to determine whether the settings adhered to neonatal waveform morphology.Incomplete reporting and lack of transparency hinder the understanding of the strengths and weaknesses of a study and limit its reproducibility and usability.Moreover, transparent and detailed reporting is required to confirm the adherence to regulatory compliance and is crucial for the clinical adoption of these methods.
Similar to the QRS complex in ECG signals, the acceptable ranges of physiological signals for neonates are also different from those of the adult population.This review found that no studies reviewed the acceptable ranges of the analyzed signals against any published guidelines, which could pose several limitations in the clinical adoption of the methods.This is consistent with another review looking into physiological vital sign ranges from 34 weeks gestational age, and it identified that several studies reported the means of vital signs instead of ranges, which makes the interpretation into clinical practice difficult [80].Here, we recommend clear reporting and the use of physiological signal ranges that are clinically validated through published studies and textbooks [81][82][83].ing features from both signals ond were discarded.ECG QRS complex detected using the Pan-Tompkins algorithm [85] to generate RR intervals.QRS complexes were filtered using a technique from Chazal et al [86] Bradycardia severity estimation accuracy was im-RR intervals from ECG were extracted using a 3-lead ECG at 500 Hz, respiration signal at 50 Hz Participants; n=10; data size: ~20-70 hours each; model: diagnostic; outcome metric: bradycardia Gee et al [26], 2016 PICS e database [25,88] proved by an average of 11% using a point process model of heart rate and respiration modified Pan-Tompkins algorithm (modification details NR).
Analysis was done on a 3-minute window before each bradycardia.
No processing was reported for respiration signals analyzed

Study settings Author, year Data set used
A point process model-based prediction algorithm achieved a mean AUROC f of 0.79 for >440 bradycardic events and was able to predict bradycardic events on an average of 116 seconds before onset (FPR g =0.15) RR intervals from ECG were extracted using a modified Pan-Tompkins algorithm (modification details NR).The artifacts, due to movement, disconnection, or erroneous peaks, were removed by visual inspection.No processing was reported for respiration signals.Additional analysis on the frequency content of the RR time series was done using Morlet wavelet transform [89] 3-lead ECG at 500 Hz, respiration signal at 50 Hz Participants, 10; data size: ~20-70 hours each; model: prognostic (+116 seconds); outcome metric: bradycardia Gee et al [25], 2017 PICS Nonparametric modeling using kernel density estimation achieved a 5% false alarm rate in predicting the onset of bradycardia events Baseline wander was removed using a highpass filter with a cutoff frequency between 0.5 and 0.6 Hz.Motion and disconnection artifacts were removed by visual inspection.QRS complexes were detected using Pan-Tompkins algorithm [85].Signals were segmented 5 minutes before and 2 minutes after a bradycardic event 3-lead ECG at 500 Hz Participants; n=10; data size: ~20-70 hours each; model: prognostic (time NR); outcome metric: bradycardia Das et al [27], 2019 PICS Time and frequency domain features were extracted.An extreme gradient boosting model achieved an average AUROC of 0.867.HRV h results showed a significant variation between a healthy infant and an infant prone to bradycardia QRS complex was detected using an algorithm (NR).RR intervals were calculated from the detected peaks 3-lead ECG at 500 Hz Participants; n=11; data size: ~20-70 hours each for 10 and 10 weeks for 1 participant; model: prognostic (time NR); outcome metric: bradycardia Mahmud et al [28] The same preprocessing techniques as reported in Ghahjaverestan et al [35].QRS complexes were identified using Pan-Tompkins method [85].The RR time series were uniformly upsampled to 10  Classification using a combination of all features and logistic regression model reached a mean accuracy of 0.79 (SD 0.12) and mean precision of 0.82 (SD 0.18), 3 hours before the onset of sepsis QRS complexes from ECG were extracted using a DT-CWT-based method described the same as Joshi et al [45].CI signal was filtered to remove cardiac artifacts, and peaks were detected using methods similar to those in previous works (NR).Features were extracted from every 1-hour signal ECG at 250 Hz, CI at 62.5 Hz Participants; n=64; data size: NR; model: prognostic (+3 hours); outcome metric: sepsis Cabrera-Quiros et al [47], 2021 Máxima medical Center NICU 47 features were extracted from the vitals.A logistic regression model achieved 0.9 AUROC in detecting central apnea QRS complexes were detected using the same method as reported in Joshi et al [45] and Cabrera-Quiros et al [47].From ECG, SII ad was calculated by applying a bandpass filter (0.001-0.40 Hz) using 10-second segments and then computing a kernel density estimate to return patient motion measurement every second.RR intervals were resampled at 250 Hz.CI signal was processed using the method by Redmond et al [99]  QRS complexes from ECG were extracted using a DT-CWT-based method described in [95].CI signal was filtered to remove cardiac artifacts (method NR).Peaks were detected using the method reported by Lee et al [96].SII was calculated from ECG and CI waveforms using a CWT-based method, as reported by Zuzarte et al [100] The proposed framework provided real-time analysis and HRV extraction to identify the characteristics correlated to periods of high distress or pain Pan-Tompkins algorithm [85] was modified to detect QRS complexes.ECG was filtered using a bandpass filter with a 16-26-Hz cutoff frequency.A low-pass filter by an order 120 FIR aj filter with a corner frequency of 25 Hz and a highpass filter by an order 160 FIR filter with a corner frequency of 25 Hz were applied.Then, a polynomial filter of order 21 was applied as the differentiator filter.Finally, a 111-order moving average filter was used, and QRS complex was detected using an adaptive threshold.Lomb-Scargle LMS ak spectral estimation [102]

Handling of Missing Data
During neonatal physiological monitoring, instances of missing data may arise due to sensor disconnection, improper placements, or signal dropouts.To tackle this issue, methodologies like data imputation or interpolation are applied.For example, if gaps exist in a neonate's HR monitoring data, interpolation methods can estimate the missing values by considering neighboring data points.Widely used interpolation techniques include linear interpolation, spline interpolation, and time-based interpolation.In addition, common data imputation methods involve forward fill, backward fill, and imputation using mean or median values.Methods such as forward fill [30], moving average [44], mean imputation [64,66], and interpolation [67] were used by some studies reviewed in this paper.

Artifact Removal
Neonatal signals can be affected by artifacts, such as those from muscle movements or electrical interference.Commonly used techniques, such as bandpass or notch filters, along with moving averages, are used to effectively eliminate these disturbances.For instance, in neonatal EEG signals, adaptive filters prove beneficial in eliminating artifacts caused by muscle movements, resulting in a clearer representation of the baby's brain activity.Some methods used by the reviewed papers were high-pass filter [27,46] bandpass filter [29,33,44,45,56].

Overview
Resampling is a technique that standardizes data intervals, involving either upsampling (increasing data point frequency) or downsampling (decreasing frequency) to create a regular time series.This aligns signals from different devices or physiological sources.Normalization ensures uniformity and reliability across these standardized sampling rates.For instance, if neonatal HR signals from different devices have varied sampling rates, resampling achieves a common rate, while normalization, using techniques such as minimum-maximum, z score, or log scale, ensures consistent amplitude scaling for accurate comparative analysis.In the reviewed studies, normalization techniques such as minimum-maximum [53] and 0 mean normalization [29,59] were used.In terms of resampling, both downsampling [33,34,41] and upsampling [39] techniques were used.

Waveform Feature Extraction
Extracting relevant features from a signal's waveform is a fundamental step in signal preprocessing.This involves identifying key characteristics such as peaks, troughs, or other significant points in the signal.In the context of neonatal ECG, feature extraction may involve identifying key points such as R-peaks to analyze HRV, providing valuable insights into the infant's autonomic nervous system development.The Pan-Tompkins algorithm is a popular method chosen by multiple papers reviewed in this study that conducted R-peak detection from the QRS complex [22,24,27,33,35,39].

Data Segmentation
Segmenting data is the process of breaking down a continuous signal into smaller, more manageable sections to enable targeted analysis.This practice is especially beneficial when dealing with lengthy signals.Data segmentation is a common preprocessing step in ML workflows.For instance, in the analysis of neonatal sleep patterns using EEG, data segmentation can involve dividing the continuous EEG signal into epochs, allowing for the identification and study of sleep stages in shorter, more manageable segments.Commonly used segmentation techniques include fixed length, sliding window, and threshold-and feature-based segmentation.Some of the data segmentation sizes used in the reviewed studies were 30-second [22][23][24]45] and 1-minute [41] epochs and a sliding window of varied sizes [35,40,55,59,64].

RenderX
In neonatal physiological signal processing, these preprocessing techniques contribute to the accurate interpretation of signals, aiding health care professionals in monitoring and providing appropriate care in the NICU or other clinical settings.
It can be seen from Table 3 that only 7 (13%) out of the 52 reviewed studies reported all the recommended preprocessing steps.This could have several impacts on the downstream analysis.For instance, several papers missed reporting on how they segmented the data for feature extraction and classification, although it is essential for clinical validation in cases where the segment duration is dependent on the adverse outcome prediction performance.In HRV analysis, it is important to indicate whether it is a short-term (~5 minutes) or a long-term (≥24 hours) analysis as they reflect different underlying physiological processes and thus demonstrate different predictive power [107].Along with the segment duration, additional information such as the sampling rate of the signals will provide a clear reflection of the data set size.Downsampling the data to a low sampling rate (eg, 50 Hz) has also shown a significant impact on HRV analysis [108].Although all the reviewed studies mentioned the participant number, and majority of them (n=39) reported the sampling rate of the signals, very few provided details on the sample size or data set duration or whether the data set was resampled for subsequent analysis.These elements provide a clearer picture of the computational time and resources required for clinical validation and adoption.Although physiological recordings collected in the NICU environment suffer greatly from missing data due to similar factors that introduce artifacts [109], reporting how missing data are handled is scarce.Different methods for dealing with missing values could cause different results, and not all might be suitable for a particular problem.Therefore, it is important to report all the details related to the adopted approach.
The incomplete or partial reporting found in these studies has significant implications for the implementation of QMS in using these techniques for clinical adoption.A good implementation of QMS requires a comprehensive reporting of each intermediary step involved in constructing an AI and ML pipeline.The International Medical Device Regulators Forum offers guidance on the clinical evaluation required for any product intended for use as a medical device [110].According to the International Medical Device Regulators Forum guidelines, during clinical evaluation, relevant research articles are reviewed to identify clinical evidence supporting the product [111].The guideline encourages manufacturers to follow these recognized standards and best practices in the development, validation, and manufacturing processes.Clinical evaluations are required by the European Union medical device regulation, and it is also mentioned in the ISO 13485 (the quality management standard for medical devices).Thus, detailed reporting is crucial as it can be used by regulatory bodies to evaluate future SaMD products clinically.Steps such as the missing data handling procedures are also required by the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) checklist for model development and validation, which assesses the risk of bias and clinical usefulness of the prediction model [112].Another example is a questionnaire prepared by the German Notified Body Interest Group, and it was adopted to assess some AI-powered medical products in the European Union.This questionnaire includes inquiries about data management, including data collection, labeling, preprocessing procedures, and relevant documentation.Transparent and detailed reporting of these steps is essential to ensure the safety, efficacy, and reliability of SaMD.

Principal Findings
This review aimed to summarize the computational methods used for preprocessing preterm infants' physiological data as a first step in developing data-driven predictive models for adverse outcomes related to clinical decision support.This is an important step, especially from a clinician's perspective, because it increases the trustworthiness of the developed models by allowing for the verification and reproduction of the results.In addition, it aids in achieving regulatory compliance and ensures the safety, efficacy, and ethical use of AI-based health care devices.Furthermore, it allows us to recognize the shortcomings in the current state-of-the-art studies and recommend guidelines for transparent reporting.The review found that the studies were heterogeneous in terms of their methods and applications.Therefore, a narrative approach to reporting the results was taken instead of a quantitative approach.Through the analysis we identified several key components that were incomplete or partially reported by the included studies, which are summarized in Table 3.To ensure transparent reporting for any future studies in this area, we recommend detailed reporting of all preprocessing steps listed in Table 3, which will allow revealing their strengths and weaknesses and ultimately make them usable and reproducible.Reproducible research allows clinicians to make more informed decisions about patient care and treatment based on the evidence that has been thoroughly assessed.

Comparison With Prior Work
The reviews published in recent years have highlighted the potential of big data and AI in supporting clinical decision-making in the neonatal health care domain [10,15,21,113,114], particularly in using physiological data for detecting or predicting neonatal health outcomes.However, appropriate preprocessing of these data is a prerequisite for developing clinically deployable models.A systematic review by McAdams et al [10] reported different ML models used to predict different clinical outcomes in neonates.However, their primary focus was on 5 neonatal morbidities, and they did not focus on reporting the preprocessing methods applied before building the ML models.Furthermore, they did not include studies using real-time continuous physiological data; 28 out of their 68 studies were based on physiological data (not continuous), and the rest were based on electronic medical records and imaging data.Latremouille et al [15] performed a review on HRV analysis for neonates.The primary limitation of the work was the lack of reporting in detail about the preprocessing steps of ECG signals before HRV analysis, such as ECG handling and segmentation, R-wave (QRS complex) identification technique, software and parameters, and ranges of all HRV features.They identified these components as XSL • FO RenderX incomplete or missing in the studies they reviewed and thus recommended clear reporting of these aspects for future studies in this area.These limitations served as a motivation for our review to focus on the preprocessing techniques of neonatal physiological signals in a broader sense, which serves as the preliminary step for any big data-based approaches.

Limitations
There are several limitations to this review.Screening of all the included studies was conducted independently by 1 reviewer, which may have introduced bias.In addition, this review did not include a quantitative or comparative analysis of the reviewed studies, as the techniques used to analyze the physiological signals were diverse.Future work could include a quantitative evaluation of the studies that were homogeneous in design.

Conclusions
This review explores the computational methods used by the current state-of-the-art ML-driven clinical decision support approaches to preprocess physiological signals collected from infants treated in the neonatal setting.A summary of the studies identified heterogeneity in the techniques used for analysis and revealed a lack of consistent and detailed reporting, which is important for building robust, transparent, and clinically deployable prediction models.The availability of powerful hardware and software resources in the NICU environment and growing interest in big data and AI are driving strong demand for clinical decision support applications.We recommend clear reporting of the different steps in the preprocessing of the neonatal physiological signals to ensure transparency in clinical validation and accelerate the adoption of developed models in the clinical setting.This will further enhance the delivery and adoption of reliable, regulatory-compliant, safe, and effective products in health care.

Figure 1 .
Figure 1.PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram for the database search and study selection.

Table 2 .
Summary of the articles reviewed in this study, grouped according to the homogeneity in terms of the data sets used and sorted by the publication year.

Table 3 .
Required physiological signal preprocessing steps reported by each of the studies in this review.