Authors: Reihaneh Zarrabi, Riley McDermott, and Sagy Cohen – University of Alabama; Seyed Mohammad Hassan Erfani – Columbia University
Title: Bankfull and Mean Flow River Channel Geometry Estimation for the CONtiguous United States (CONUS) using Machine Learning for Hydrological Applications
Abstract: Widely adopted models for estimating channel geometry attributes rely on simplistic power-law (hydraulic geometry) equations. This study aims to enhance the accuracy and expand the applicability of these models to all river streams in the CONtinuous United States (CONUS). To achieve this, a new preprocessing method was applied to refine an extensive observational dataset, namely HYDRoacoustic dataset supporting Surface Water Oceanographic Topography (HYDRoSWOT). This process involved improving data quality and identifying observations representing bankfull and mean flow conditions among all of the records for each gauged river. A compiled dataset, combining the preprocessed observational dataset with datasets containing river flow and catchment attributes like the National Hydrography Dataset Plus (NHDplusv2.1), was then used to train models. These models utilized advanced tree-based machine learning algorithms (Random Forest Regression (RFR) and eXtreme Gradient Boosting Regression (XGBR)) and traditional statistical methods (Multi-Linear Regression (MLR)) to predict channel geometry parameters, including width and depth under bankfull and mean flow conditions. Two tiers of models were developed for each attribute using discharges derived from distinct sources (HYDRoSWOT and NHDPlusV2.1, respectively). The second tier of models, selected as the final model, demonstrated accuracy, with R2 values ranging from 0.63 to 0.86 when using XGBR, and extended applicability to approximately 2.2 million river streams, as shown in NHDplusv2.1. Additionally, comprehensive independent evaluations are conducted to assess the capability of the developed models in providing stream-averaged (rather than at-a-station) predictions for streams that are not part of either the training or testing datasets.