R packages by xiaoruizhu

SurrogateRsq - Goodness-of-Fit Analysis for Categorical Data using the Surrogate R-Squared

To assess and compare the models' goodness of fit, R-squared is one of the most popular measures. For categorical data analysis, however, no universally adopted R-squared measure can resemble the ordinary least square (OLS) R-squared for linear models with continuous data. This package implement the surrogate R-squared measure for categorical data analysis, which is proposed in the study of Dungang Liu, Xiaorui Zhu, Brandon Greenwell, and Zewei Lin (2022) <doi:10.1111/bmsp.12289>. It can generate a point or interval measure of the surrogate R-squared. It can also provide a ranking measure of the percentage contribution of each variable to the overall surrogate R-squared. This ranking assessment allows one to check the importance of each variable in terms of their explained variance. This package can be jointly used with other existing R packages for variable selection and model diagnostics in the model-building process.

Last updated 1 years ago

categorical-data-analysisgoodness-of-fitr-squared-statisticstatistics

4.48 score 5 stars 12 scripts 170 downloads

PAsso - Assessing the Partial Association Between Ordinal Variables

An implementation of the unified framework for assessing partial association between ordinal variables after adjusting for a set of covariates (Dungang Liu, Shaobo Li, Yan Yu and Irini Moustaki (2020) <doi:10.1080/01621459.2020.1796394> Journal of the American Statistical Association). This package provides a set of tools to quantify, visualize, and test partial associations between multiple ordinal variables. It can produce a number of $phi$ measures, partial regression plots, 3-D plots, and p-values for testing H_0: phi=0 or H_0: phi <= delta.

Last updated 1 years ago

association-analysisordinal-variablespartial-associationstatisticscpp

4.14 score 7 stars 1 dependents 13 scripts 213 downloads

SPSP - Selection by Partitioning the Solution Paths

An implementation of the feature Selection procedure by Partitioning the entire Solution Paths (namely SPSP) to identify the relevant features rather than using a single tuning parameter. By utilizing the entire solution paths, this procedure can obtain better selection accuracy than the commonly used approach of selecting only one tuning parameter based on existing criteria, cross-validation (CV), generalized CV, AIC, BIC, and extended BIC (Liu, Y., & Wang, P. (2018) <doi:10.1214/18-EJS1434>). It is more stable and accurate (low false positive and false negative rates) than other variable selection approaches. In addition, it can be flexibly coupled with the solution paths of Lasso, adaptive Lasso, ridge regression, and other penalized estimators.

Last updated 1 years ago

feature-selectionstatisticsvariable-selectioncpp

3.60 score 1 stars 2 scripts 229 downloads

DataClean - Data Cleaning

Includes functions that researchers or practitioners may use to clean raw data, transferring html, xlsx, txt data file into other formats. And it also can be used to manipulate text variables, extract numeric variables from text variables and other variable cleaning processes. It is originated from a author's project which focuses on creative performance in online education environment. The resulting paper of that study will be published soon.

Last updated 7 years ago

openjdk

2.70 score 4 scripts 174 downloads