br For clinical validation we
For clinical validation, we examined 248 tissue specimens from two independent patient cohorts with T1 and T2 early-stage GCs, which were referred to as clinical cohort-1 (or testing cohort) and clinical cohort-2 (validation cohort). These cohorts included 50 and 198 speci-mens without pretreatment from LN-positive (LNP) and LN-negative (LNN) patients, respectively. We established a gene-signature based on the candidate Phosphatase Inhibitor Cocktail II identified from the high-throughput dataset-based discovery phase, using clinical cohort-1. Subsequently these genes were validated in the clinical cohort-2 patients. The detailed pa-tient demographics and clinicopathological characteristics are shown in Table 1 and Supplemental materials and methods.
Since GC diagnosis and treatment decision-making is primarily de-cided following endoscopic resections, we also included T2 lesions con-sidering that these lesions can be underestimated during endoscopy. Lymphovascular invasion was diagnosed after pathological review of surgical tissues, and data for serum levels of carcinoembryonic antigen (CEA) and cancer antigen (CA) 19–9 were collected from each partici-pating institution.
2.2. Ethics statement
Written informed consent was obtained from all patients, and the study was approved by the institutional review boards of all the partic-ipating institutions.
2.3. Study design and participants
Our study design included the following two major phases: a bio-marker discovery and a clinical validation phase. Based on RNA-Seq data for T1 patient specimens in the TCGA dataset, we first prioritized 15 genes differentially expressed between 5 LNP and 13 LNN patients with GC. Using the same set of specimens for training, we built a multi-variate logistic regression model using the 15 genes as covariates, and
Demographic, clinical characteristics and tumor markers for clinical cohort 1 and 2a.
LN Positive LN Negative LN Positive
positive 3 4
positive 5 7
Clinical N stage (CT)
positive NA NA 6
negative NA NA 20
Pathological T stage
a Plus-minus values are means ± SE. NA denotes not available.
Carlsbad, CA, USA). Quantitative real-time reverse transcription analysis (qRT-PCR) was performed using the SensiFAST™ SYBR® Lo-ROX Kit (Bioline, London, UK) on the Quantstudio 7 Real Time PCR System (Life Technologies, Carlsbad, CA, USA). The average expression levels of target genes were normalized against beta-actin using the compara-tive CT method . To ensure consistent measurements throughout all assays, for each PCR amplification reaction, three independent cDNA samples were loaded as internal controls to account for any plate-to-plate variation, and the results from each plate were normalized against internal normalization controls.
2.5. Statistical analysis
Wilcoxon's signed-rank tests, Mann-Whitney U tests and Kruskal-Wallis tests were used to analyze gene expression data, as appropriate. The Benjamini-Hochberg method was used to correct for multiple hy-potheses testing, wherever applicable. Risk scores derived from the 15-gene multivariate logistic regression model were used to plot receiver-operating-characteristic (ROC) curves and calculate area under the curves (AUCs). Confidence intervals for the ROC curves were calculated using the method of DeLong  as well as the statisti-cal significance of comparison two ROC curves. Univariate and multivar-iate logistic regression models were employed to evaluate the statistical significance of clinicopathological variables and the 15-gene model in diagnosing LN metastasis status. All statistical analyses were performed using Medcalc V.12.3.0 (Broekstraat 52, 9030; Mariakerke, Belgium), the GraphPad Prism V5.0 (GraphPad Software, San Diego, California, USA) and R (3.3.3, R Development Core Team, https://cran.r-project. org/).