Non-linear techniques for designing risk profiles of suicide ideation among students in Beijing.

Suicide is a leading cause of death worldwide, killing more than 800,000 people each year with China accounting for almost one-third of suicide deaths worldwide. Unlike most Western countries where accidents are the leading cause of adolescent deaths, suicide is the leading cause of death among adolescents in China, according to China’s Center for Disease Control and Prevention. The burden of disease is great – there are approximately 250,000 suicides in China every year with an estimated two-million suicide attempts.

Despite the sharp increase in suicide-related research and the development of numerous risk assessment tools and interventions, a reduction in suicide rates has not been achieved – in fact, suicide rates are on the rise in many countries. We believe that this is in part due to (1) the complex etiology of suicidal behaviors and outcomes, and (2) the type of data that is conventionally collected regarding suicidal behaviors and outcomes – largely narrative, retrospective data that takes only non-fatal outcomes into consideration.

Although more comprehensive longitudinal studies on the subject would likely yield greater insights while limiting biases and errors, such research is costly, time-consuming, and difficult to implement. Thus, the vast majority of data regarding suicidal behaviors comes from cross-sectional surveys of multivariate categorical data. We wondered if there was a way to make this highly available and “cheap” data more useful.

Suicide ideation has a complex etiology. In spite of this complexity, most studies in the literature rely on descriptive statistics and regression analysis (often logistic regression since surveys with a high number of categorical variables are involved.) While this type of analysis is easy to understand and explain, performing a direct high-dimensional regression of all variables can be problematic because the blind application of high-dimensional regression is liable to produce spurious inferences, for the same reasons that uncorrected multiple testing can (significance due to chance).

So we thought about approaches that could (1) reliably handle heterogeneous and non-continuous data, and (2) better reflect the complex reality of suicide ideation. Given this, we chose to use a non-linear method known as Mixed Membership Model (MMM) to analyze our data.

We also hypothesize that group-level membership is predictive of suicide ideation, and thus, employed a MMM to reveal group-level prediction patterns after clustering. Since our dataset consists of mostly multivariate categorical data, we perform dimensionality reduction using LCA. This method finds subgroups of related individuals (latent classes) from the data, along with patterns of typical survey responses. The mixed-memberships can then be input to a model to predict ideation, which we evaluate on a held out year using prediction and recall diagnostics.

It is our hope that revealing such patterns may yield a deeper understanding of how ideation varies at the group-level and ultimately lead to the discovery of indicators that might be useful for designing risk profiles for adolescents with suicide ideation while informing interventions and policy aimed at reducing the burden of disease for suicidal behaviors.

This is an ongoing, collaborative research project supervised by Professor Mike Baiocchi (Stanford) and Professor Yi Song (Peking University). For more information about this project, please email Shea Shelton: