Academy of Diagnostics & Laboratory Medicine - Scientific Short

How can a comprehensive computational workflow assist in non-small cell lung cancer biomarker discovery?

Manan Vij & Alex Rai

Lung cancer is among the most diagnosed cancers in the world. Non-Small Cell Lung Cancer (NSCLC) is the most common type of lung cancer within the U.S.1 Tissue biopsy is the gold standard for detecting lung cancer but is highly invasive as it necessitates the extraction of a sample of tissue for histologic analysis.2 It also carries risks of bleeding and/or infection, and is inconvenient from a patient perspective. The development of a minimally invasive test, utilizing a blood or urine sample, capable of providing accurate results for lung cancer detection and/or subtyping, would significantly improve the clinical landscape and streamline patient care.

A promising approach to develop a minimally invasive test lies in the field of computational proteomics, which uses advanced computational tools and algorithms to analyze large-scale proteomic data. Computational proteomics enables the integration of high-throughput data from various platforms, such as mass spectrometry and protein microarrays, allowing for the identification and quantification of proteins that may serve as potential biomarkers for early detection, prognosis, or therapeutic response.3 NSCLC biomarker discovery can greatly benefit from this approach by identifying novel protein signatures and molecular pathways that are characteristic of the disease.

To explore the role of computational proteomics in the context of NSCLC, extracellular vesicles were extracted and cell lysate was prepared from H1299 and A549 cells, each corresponding to a specific subtype of human non-small cell lung cancer carcinoma4, lung cancer cell lines kept in serum-free media, as previously described in Ambrosini et al.5 Protein expression levels were quantified using LC/MS-MS, and the resulting data were analyzed with a custom proteomics workflow, which we have established, for normalization and differential expression (DE) analysis. DE analysis identified several proteins that differ between the two NSCLC subtypes, and subsequent pathway analysis revealed distinct differences in hallmark cancer pathways. To translate these proteomic findings into a diagnostic test, we focused on narrowing down the list of protein biomarkers to those previously identified in the urine of NSCLC patients, resulting in 19 candidate proteins. Further analysis using an external dataset revealed that many of these candidates were also prognostic for predicting NSCLC survival outcomes, when detected at the mRNA level.

Our computational analysis toolkit that enabled the preceding analysis was constructed as a GUI-based R application to allow for streamlined analysis of proteomic data. The application architecture supports modular and scalable data processing pipelines, incorporating efficient data handling for rapid in-memory manipulation of large protein expression matrices and associated metadata. The GUI design prioritizes user experience by abstracting complex statistical and computational methods behind intuitive button and text-input controls, requiring only the upload of a protein expression dataset and relevant sample metadata. Furthermore, the GUI-based design eliminates the need for any coding experience, allowing users to conduct the analysis entirely through an easy-to interface. Once uploaded, the user can conduct a variety of relevant proteomics analyses, including data normalization and visualization, exploratory data analysis, differential expression testing, and functional pathway analysis. For example, in our experiment, the protein expression matrix was scaled using quantile normalization to mitigate external, non-biological differences between the samples. Differential gene expression testing was performed using a standard Student's T-test to determine significant differentially expressed proteins among various sample comparisons, including the two lung cancer cell lines. Principal component analysis of exosomal and lysate protein samples was used to visualize the overall protein profile in three-dimensional space.

In conclusion, the use of computational proteomics, exemplified by our customized workflow, enables efficient identification of NSCLC biomarkers, insights into disease pathways, and suggests prognostic significance of a subset of candidate biomarkers. Computational tools, such as the one we have developed, show promise for streamlining complex analyses and providing turnkey solutions for rapid, comprehensive proteomics data analysis

References

  1. Myers DJ, Wallen JM. Lung Adenocarcinoma. [Updated 2023 Jun 12]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. Available from: https://www.ncbi.nlm.nih.gov/books/NBK519578/
  2. Bertoli E, De Carlo E, Basile D, Zara D, Stanzione B, Schiappacassi M, Del Conte A, Spina M, Bearz A. Liquid biopsy in NSCLC: An investigation with multiple clinical implications. Int J Mol Sci. 2023 Jun 28;24(13):10803. doi: 10.3390/ijms241310803. PMID: 37445976; PMCID: PMC10341684.
  3. Messner CB, Demichev V, Wang Z, Hartl J, Kustatscher G, Mülleder M, Ralser M. Mass spectrometry-based high-throughput proteomics and its role in biomedical studies and systems biology. Proteomics. 2023;23(7-8):2200013. doi: 10.1002/pmic.202200013.
  4. Bairoch A. The Cellosaurus, a cell line knowledge resource. J Biomol Tech. 2018;29:25–38. doi: 10.7171/jbt.18-2902-002. PMID: 29805321; PMCID: PMC5945021. RRID: CVCL_0060. RRID: CVCL_0023.
  5. Ambrosini G, Rai AJ, Carvajal RD, Schwartz GK. Uveal melanoma exosomes induce a prometastatic microenvironment through macrophage migration inhibitory factor. Mol Cancer Res. 2022 Apr 1;20(4):661–669. doi: 10.1158/1541-7786.MCR-21-0526. PMID: 34992145.
     

Scientific Shorts are brought to you by the

The Academy of Diagnostics & Laboratory Medicine logo

Advertisement
Advertisement