pkgdown/extra.css

Skip to contents

Introduction

Before you can start using HDAnalyzeR for biomarker discovery, it is essential to ensure that your data is prepared in the appropriate format. While HDAnalyzeR offers a variety of powerful tools, it does not include technology-specific quality control (QC) or preprocessing functions. This design choice allows the package to be flexible and usable with a wide range of proteomics technologies without being limited to specific workflows. Many labs already use their own preprocessing pipelines, tailored to their specific research needs and technologies. Integrating all these varied pipelines into the package would make it unnecessarily complex and restrictive.

Therefore, the data provided to HDAnalyzeR must be preprocessed in order to follow specific requirements (Table 1).

Table 1. General requirements

Requirement Description
Data Long format requires at least three columns: sample/ID, feature (e.g. proteins, genes, etc.), and measurement (e.g. TPM, NPX, RFU…).
Wide format requires the following structure: rows = samples; columns = sample/ID (first column) + features.
Metadata Table with one row per sample; includes sample/ID and any required columns (e.g., group labels, covariates).
Standardized feature IDs Recommended to use gene symbols, UniProt IDs, etc., depending on platform. Feature IDs may be platform-specific. Mapping tables can help translate to gene symbols when required (e.g., for pathway analyses).
Matching sample IDs Sample IDs must match exactly between metadata and expression matrix.

⚠️ Although HDAnalyzeR is primarily designed for proteomics data, some functions in the package can also be applied to other omics data types such as genomics, transcriptomics, or metabolomics. However, it is important to note that the specific choice of functions and how they should be used will depend on the study’s goals and the nature of the data. HDAnalyzeR does not provide explicit guarantees and the user must make informed decisions regarding its application to other types of omics data. Documentation and the source code of all functions is freely available.