Creating a Data Quality System for Digital Biomarker Development

Creating a Data Quality System for Digital Biomarker Development

Wearable technologies and their associated informatics platforms gather, store, and process vast amounts of health-related, real-world data. These datasets can be some of the most complex used in health research. If the end-goal is to provide evidence in regulatory decision making, implementing well-defined practices to demonstrate sufficient data quality and fidelity is a must.

Our ten years of experience using wearable and connected technologies in clinical trials, along with existing FDA guidance, has revealed the following components essential to ensure the data quality and integrity of real-world data in digital biomarker development.

Engage Study Sponsor and FDA

Before real-world data is used in regulatory decision making, the FDA will assess its relevance and reliability. Like all aspects of clinical trials, however, this is not a one-size-fits-all approach. Engaging with the study sponsor and the FDA during the initial study design helps to develop and define rigorous data quality procedures for all source data.

Prove Relevance

Using novel real-world data from wearable sensors to support regulatory decision-making requires that researchers adequately define how it addresses the scientific question at hand.

  • To prove that the data is relevant, provide evidence that the methods used to generate RWD sufficiently, and accurately reflect the population’s experience. Include a detailed description of the analytical and clinical validation (DiMe) of the endpoint provided.
  • To demonstrate analytical validity, compare the derived health-related measures to an accepted reference standard in an experiment that utilizes the defined data collection, cleaning, and processing methods in the specific study population and setting.
  • To demonstrate clinical validity, establish an evidence-based relationship between the novel digital biomarker and traditionally accepted population endpoints and hard clinical outcomes (e.g., survival).

Here are four steps that help prove the relevancy of the digital biomarker for regulator decision-making:

1. Standardize data resolution and device wear location

There are many nuances of collecting RWD using wearable sensors. Two important factors that are often overlooked during study design are data resolution, and device wear location. Valid health-related outcomes from sensor signals can only be derived from data collection and processing procedures that include appropriate and standardize sampling frequency and wear site.

  • Data resolution: Wearable sensor data can range in resolution from >1000Hz unprocessed signals to daily summary measures. The sampling frequency is sometimes user-defined and sometimes manufacturer defined. Selecting the correct sampling frequency depends on the measure being derived and the algorithmic methods available to process the raw signal. For example, the most recent and accurate ways to obtain physical behavior variables from wearable accelerometers rely on high-frequency raw accelerations.
  • Device wear location: Some wearable sensors can be worn on multiple body locations. Data processing methods are most often dependent on where on the body the device is worn. For example, step counting algorithms applied to raw acceleration signals are optimized based on wear location and must be appropriately selected.

2. Address signal artifact

Real-world data collected from wearable technologies can have a poor signal to noise ratio. If signal artifact is not properly addressed, the validity of derived measures is jeopardized. To ensure quality data are being used in the analysis, it is necessary to define and standardize the data cleaning procedures during study design and before data collection begins.

3. Reflect clinically meaningful aspects of patient health

When used as an endpoint, the digital biomarker must reflect a meaningful aspect of how the patient feels or functions. It should be a characteristic of health that the patient cares about and whether this characteristic deteriorates, improves, or is prevented.

4. Outline treatment benefit rationale

There is a clear rationale that the treatment benefit produced by the drug or intervention will be of sufficient size and translated to the selected digital biomarker.

Demonstrate Reliability

The reliability of real-world data is established by defining appropriate data collection procedures (data accrual) and minimizing errors by ensuring data quality is sufficient during data collection, transfer, storage, and analysis (data assurance).

To demonstrate the RWD’s reliability, provide documentation and evidence of the methods used to preserve data integrity from its generation through its reporting. Consider including standard operating procedures (SOP), results of periodically performed system checks, and evidence of a robust data audit system.

Here are six steps to demonstrating the reliability of RWD:

1. Define standard operating procedures

SOP’s should be defined for all aspects of handling study-related data, including data collection, transfer, storage, cleaning, processing, and reporting.

2. Verify adherence to the experimental protocol

Implement procedures to verify adherence to the experimental protocol defined in the SOP. Examples include taking screenshots of important device setup parameters (e.g., sampling frequency), and checklists certifying proper device placement on the patient (e.g., respiratory bands fit snugly in the appropriate anatomical location).

3. Authenticate data transfer from the source to the analysis platform

In addition to robust data security and privacy, data format and integrity must be preserved during transfer. Implement methods to prevent or detect unwanted data alterations by building them into a robust data transfer platform or comparing data consistency pre and post-transfer.

4. Authenticate data storage from source and all associated metadata

Sensor data often require accompanying metadata, such as timestamps and signal units. Data storage systems must be tested not only for sensor data consistency but valid metadata compilations.

5. Define methods for testing, reporting, and handling missing data

Datasets collected in the real-world are sometimes prone to missing data due to patient non-compliance or device failures. Methods for testing, reporting, and handling missing data must be defined and documented during study design and before data collection commences.

6. Maintain an audit trail

Like all scientific experiments, evidence generated by RWD from wearable sensors and submitted to the FDA for regulatory decision making must be reproducible. It is critically important to maintain and document an audit trail that includes the original raw data source and records any alteration to the data, including cleaning, processing, annotating, and summarizing.


Data quality and fidelity are requirements for FDA validation of novel digital endpoints. Ensuring that data is managed with care, and all processes are well documented, is essential for researchers to present sufficient detail about the validity of digital biomarker used in a clinical trial.

In our eBook 4 Key Principles of Digital Biomarker Discovery, we expand on each of the four principles of digital biomarker discovery, and provide specific examples from our own work and literature when appropriate. We will also explain how and why these principles are most impactful and relevant when biomarker development is conducted in early stage clinical trials or even earlier pilot studies.

4 Key Principles of Digital Biomarker Development eBook Read it Today!

Kate Lyden, PhD

Kate Lyden, PhD

Kate Lyden, PhD, VivoSense Chief Science Officer, holds degrees in Kinesiology and Applied Physiology and has extensive research experience developing, validating, and using wearable sensors.

Follow on:

Stay Connected