.github/pkgdown/extra.css

Skip to contents

This vignette will guide you through the post analysis of the results obtained from the HDAnalyzeR pipeline. The post analysis consists of two possible steps: pathway enrichment analysis and automated literature search. The pathway enrichment analysis is performed using the Gene Ontology, KEGG and Reactome databases from clusterProfiler and ReactomePA packages respectively. The automated literature search is performed using the the PubMed database.

If you want to learn more about ORA and GSEA, please refer to the following publications:

📓 Remember that these data are a dummy-dataset with artificial data and the results in this guide should not be interpreted as real results. This is why we are using extremely large p-value cutoffs in this case that should not be used in real data.

Loading the Data

We will load HDAnalyzeR and dplyr, load the example data and metadata that come with the package and initialize the HDAnalyzeR object.

library(HDAnalyzeR)
library(dplyr)

hd_obj <- hd_initialize(dat = example_data, 
                        metadata = example_metadata, 
                        is_wide = FALSE, 
                        sample_id = "DAid",
                        var_name = "Assay",
                        value_name = "NPX")

For the Over Representation Analysis we are going to use a list of differentially expressed proteins. In this example we are going to use the up-regulated proteins. We could also use the features list from the classification models or even run both and get the intersect as it is done in the Get Started guide.

de_res <- hd_de_limma(hd_obj, case = "AML")

Over Representation Analysis

First, we will perform an Over Representation Analysis (ORA) using the Gene Ontology database and the BP ontology. We will use the hd_ora() and hd_plot_ora() functions to run the analysis and plot the results respectively.

proteins <- de_res$de_res |> 
  filter(logFC > 0 & adj.P.Val < 0.05) |> 
  pull(Feature)

enrichment <- hd_ora(proteins, database = "GO", ontology = "BP")

enrichment_plots <- hd_plot_ora(enrichment)

enrichment_plots$dotplot

enrichment_plots$treeplot

enrichment_plots$cnetplot

Let’s change the database and the p-value threshold.

enrichment  <- hd_ora(proteins, database = "Reactome", pval_lim = 0.2)

enrichment_plots <- hd_plot_ora(enrichment)

enrichment_plots$dotplot

enrichment_plots$treeplot

enrichment_plots$cnetplot

Gene Set Enrichment Analysis

We can also run a Gene Set Enrichment Analysis (GSEA) using the hd_gsea() and hd_plot_gsea functions. The hd_plot_gsea() function will plot the results.

⚠️ In this case, the function requires strictly differential expression results, so a ranked list of proteins is derived based on the ranked_by argument.

enrichment <- hd_gsea(de_res, database = "GO", ontology = "BP", pval_lim = 0.55)

enrichment_plots <- hd_plot_gsea(enrichment)

enrichment_plots$dotplot

enrichment_plots$gseaplot

enrichment_plots$cnetplot

enrichment_plots$ridgeplot

We can also change the ranking variable to the product of logFC and -log(adjusted p value) instead of the default logFC by changing the ranked_by argument to “both”. We could also use other variables such as p-value or any other variable in the DE results. However, you should use as ranking a variable that has some form of biological relevance of the variable.

enrichment <- hd_gsea(de_res, 
                      database = "GO", 
                      ontology = "BP", 
                      pval_lim = 0.9, 
                      ranked_by = "both")

enrichment_plots <- hd_plot_gsea(enrichment)
enrichment_plots$cnetplot

Searching PubMed for our Biomarkers

Finally, let’s perform an automated literature search using the hd_literature_search(). The function requires a list with disease names as names and genes/proteins as values. We will create the list, run the search and preview the results.

biomarkers <- list("acute myeloid leukemia" = c("FLT3", "EPO"),
                   "chronic lymphocytic leukemia" = c("PARP1", "FCER2"))

lit_res <- hd_literature_search(biomarkers, max_articles = 5)
#> Searching for articles onFLT3andacute myeloid leukemia
#> Warning: The get_pubmed_ids() function has become obsolete. You should use the EPM_query() function instead. Please, have a look at the manual or the vignette. The get_pubmed_ids() function will be retired in the second half of 2024.
#> This warning is displayed once per session.
#> Invalid response from PubMed API
#> Searching for articles onEPOandacute myeloid leukemia
#> Invalid response from PubMed API
#> Searching for articles onPARP1andchronic lymphocytic leukemia
#> Invalid response from PubMed API
#> Searching for articles onFCER2andchronic lymphocytic leukemia
#> Invalid response from PubMed API

lit_res$`acute myeloid leukemia`$FLT3$title
#> NULL

📓 Remember once again that these data are a dummy-dataset with artificial data and the results in this guide should not be interpreted as real results. The purpose of this vignette is to show you how to use the package and its functions.

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#>  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#>  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> time zone: UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] org.Hs.eg.db_3.21.0  AnnotationDbi_1.70.0 IRanges_2.42.0      
#> [4] S4Vectors_0.46.0     Biobase_2.68.0       BiocGenerics_0.54.0 
#> [7] generics_0.1.4       dplyr_1.1.4          HDAnalyzeR_0.99.0   
#> 
#> loaded via a namespace (and not attached):
#>   [1] DBI_1.2.3               gson_0.1.0              gridExtra_2.3          
#>   [4] rlang_1.1.6             magrittr_2.0.4          DOSE_4.2.0             
#>   [7] ggridges_0.5.7          compiler_4.5.1          RSQLite_2.4.3          
#>  [10] reactome.db_1.92.0      png_0.1-8               systemfonts_1.3.0      
#>  [13] vctrs_0.6.5             reshape2_1.4.4          stringr_1.5.2          
#>  [16] pkgconfig_2.0.3         crayon_1.5.3            fastmap_1.2.0          
#>  [19] XVector_0.48.0          ggraph_2.2.2            labeling_0.4.3         
#>  [22] rmarkdown_2.30          enrichplot_1.28.4       graph_1.86.0           
#>  [25] UCSC.utils_1.4.0        ragg_1.5.0              purrr_1.1.0            
#>  [28] bit_4.6.0               xfun_0.53               cachem_1.1.0           
#>  [31] easyPubMed_3.1.6        graphite_1.54.0         aplot_0.2.9            
#>  [34] GenomeInfoDb_1.44.3     jsonlite_2.0.0          blob_1.2.4             
#>  [37] tweenr_2.0.3            BiocParallel_1.42.2     parallel_4.5.1         
#>  [40] R6_2.6.1                bslib_0.9.0             stringi_1.8.7          
#>  [43] RColorBrewer_1.1-3      limma_3.64.3            jquerylib_0.1.4        
#>  [46] GOSemSim_2.34.0         Rcpp_1.1.0              knitr_1.50             
#>  [49] ggtangle_0.0.7          R.utils_2.13.0          Matrix_1.7-3           
#>  [52] splines_4.5.1           igraph_2.1.4            tidyselect_1.2.1       
#>  [55] viridis_0.6.5           qvalue_2.40.0           yaml_2.3.10            
#>  [58] codetools_0.2-20        lattice_0.22-7          tibble_3.3.0           
#>  [61] plyr_1.8.9              treeio_1.32.0           withr_3.0.2            
#>  [64] KEGGREST_1.48.1         S7_0.2.0                evaluate_1.0.5         
#>  [67] gridGraphics_0.5-1      desc_1.4.3              polyclip_1.10-7        
#>  [70] Biostrings_2.76.0       pillar_1.11.1           ggtree_3.16.3          
#>  [73] clusterProfiler_4.16.0  ggfun_0.2.0             ggplot2_4.0.0          
#>  [76] scales_1.4.0            tidytree_0.4.6          glue_1.8.0             
#>  [79] lazyeval_0.2.2          tools_4.5.1             ggnewscale_0.5.2       
#>  [82] data.table_1.17.8       fgsea_1.34.2            graphlayouts_1.2.2     
#>  [85] fs_1.6.6                tidygraph_1.3.1         fastmatch_1.1-6        
#>  [88] cowplot_1.2.0           grid_4.5.1              ape_5.8-1              
#>  [91] tidyr_1.3.1             nlme_3.1-168            GenomeInfoDbData_1.2.14
#>  [94] patchwork_1.3.2         ggforce_0.5.0           cli_3.6.5              
#>  [97] rappdirs_0.3.3          textshaping_1.0.3       viridisLite_0.4.2      
#> [100] ReactomePA_1.52.0       gtable_0.3.6            R.methodsS3_1.8.2      
#> [103] yulab.utils_0.2.1       sass_0.4.10             digest_0.6.37          
#> [106] ggrepel_0.9.6           ggplotify_0.1.3         htmlwidgets_1.6.4      
#> [109] farver_2.1.2            memoise_2.0.1           htmltools_0.5.8.1      
#> [112] pkgdown_2.1.3           R.oo_1.27.1             lifecycle_1.0.4        
#> [115] httr_1.4.7              GO.db_3.21.0            statmod_1.5.0          
#> [118] bit64_4.6.0-1           MASS_7.3-65