Hypergraph Cuts with Edge-Dependent Vertex Weights

We develop a framework for incorporating edge-dependent vertex weights (EDVWs) into the hypergraph minimum s-t cut problem. These weights are able to reflect different importance of vertices within a hyperedge, thus leading to better characterized cut properties. More precisely, we introduce a new class of hyperedge splitting functions that we call EDVWs-based, where the penalty of splitting a hyperedge depends only on the sum of EDVWs associated with the vertices on each side of the split. Moreover, we provide a way to construct submodular EDVWs-based splitting functions and prove that a hypergraph equipped with such splitting functions can be reduced to a graph sharing the same cut properties. In this case, the hypergraph minimum s-t cut problem can be solved using well-developed solutions to the graph minimum s-t cut problem. In addition, we show that an existing sparsification technique can be easily extended to our case and makes the reduced graph smaller and sparser, thus further accelerating the algorithms applied to the reduced graph. Numerical experiments using real-world data demonstrate the effectiveness of our proposed EDVWs-based splitting functions in comparison with the all-or-nothing splitting function and cardinality-based splitting functions commonly adopted in existing work.


Introduction
The graph minimum s-t cut problem, or equivalently the maximum s-t flow problem, is a fundamental problem in network science. A cut is a bipartition of the graph vertices and its weight is computed by summing the weights of the edges crossing the cut. The problem aims to find the minimum weight cut that disconnects the source vertex s from the sink vertex t. It has various applications such as bipartite matching (Cherkassky et al, 1998;Lovász and Plummer, 2009), network reliability (Colbourn, 1991;Ramanathan and Colbourn, 1987), distributed computing (Bokhari, 1987), image segmentation (Boykov and Funka-Lea, 2006;Boykov and Kolmogorov, 2004) and very large scale integration (VLSI) circuit design (Leighton and Rao, 1999;Li et al, 1995). Classical solutions to this problem include "augmenting paths"-based algorithms (Ford and Fulkerson, 1956) and "push-relabel" style algorithms (Goldberg and Tarjan, 1988), to name a few.
A natural extension is the hypergraph minimum s-t cut problem. Graphs are limited to modeling pairwise relations, while hypergraphs generalize the notion of an edge to a hyperedge, which can represent higher-order interactions connecting more than two vertices. Many practical problems can be better modeled by hypergraphs (Papa and Markov, 2007;Schaub et al, 2021). For instance, in the columnwise decomposition of a sparse matrix for parallel sparse-matrix vector multiplication, hypergraphs provide a more accurate representation for the communication volume requirement than graphs, where vertices and hyperedges are respectively used to model columns of the sparse matrix and the non-zero pattern of each row (Catalyurek and Aykanat, 1999). In image segmentation, hypergraphs are leveraged to describe higher-order relations among superpixels (Ding and Yilmaz, 2008;Kim et al, 2011). In VLSI circuit design, vertices and hyperedges respectively represent gates and signal nets (Karypis et al, 1999).
The weight of a hypergraph cut is defined as the sum of splitting penalties associated with every hyperedge. Different from the graph case, there may exist multiple ways to split a hyperedge. Consequently, for each hyperedge e, we consider a splitting function w e : 2 e → R ≥0 that assigns a penalty to every possible cut of e where 2 e denotes the power set of e. For any S ⊆ e, w e (S) indicates the penalty of partitioning e into S and e \ S (Li and Milenkovic, 2017;Veldt et al, 2020a). Existing works mainly adopt two kinds of splitting functions. One is the so-called all-ornothing splitting function in which an identical penalty is charged if the hyperedge is cut no matter how it is cut (Hein et al, 2013). It is a straightforward extension of the graph case since an edge in a graph is associated with only one non-zero splitting penalty. Another slightly more general type is the class of cardinality-based splitting functions where the splitting penalty w e (S) depends only on the number of vertices placed into S (Veldt et al, 2020a;Zhou et al, 2006).
There are two major approaches for solving the minimum s-t cut problem in hypergraphs. One is to adopt submodular splitting functions for all hyperedges (cf. (5) for the definition of submodular functions), then the hypergraph minimum s-t cut problem can be solved using submodular function minimizers (Li and Milenkovic, 2018b;Veldt et al, 2020a). Another more efficient approach is to reduce the hypergraph to a graph that has the same cut properties and then leverage existing solutions to the graph minimum s-t cut problem (Ihler et al, 1993;Lawler, 1973;Li and Milenkovic, 2017;Veldt et al, 2020a). The reduction is generally implemented by expanding every hyperedge into a small graph possibly with additional auxiliary vertices and then concatenating these small graphs to form the final graph. It has been proved in (Veldt et al, 2020a) that, for cardinality-based splitting functions, the hypergraph cut problem is reducible to a graph cut problem if and only if the splitting functions are submodular. Moreover, Veldt et al (2020a) proposes a graph reduction method for an arbitrary submodular cardinality-based splitting function where a hyperedge e is expanded into a graph that has up to O(|e|) auxiliary vertices and O(|e| 2 ) edges in the worst case. This may result in large and dense graphs thus affecting the efficiency of algorithms applied to the reduced graph. To tackle this problem, sparsification techniques have been developed which try to approximate the hypergraph cut using a sparse graph with fewer auxiliary vertices (Bansal et al, 2019;Benczúr and Karger, 1996;Benson et al, 2020;Chekuri and Xu, 2018;Kogan and Krauthgamer, 2015). A follow-up paper (Benson et al, 2020) proposes a sparsification method for approximating hypergraph cuts defined by submodular cardinality-based splitting functions. The proposed method reduces the number of auxiliary vertices and the number of edges needed to expand a hyperedge e to O( −1 log |e|) and O( −1 |e| log |e|) respectively, where is the approximation tolerance parameter.
A disadvantage of the all-or-nothing splitting function as well as cardinality-based ones is that they treat all the vertices in a hyperedge equally while in practice these vertices might contribute differently to the hyperedge. Such information can be captured by edge-dependent vertex weights (EDVWs): Every vertex v is associated with a weight γ e (v) for each incident hyperedge e that reflects the contribution of v to e (Chitra and Raphael, 2019). The hypergraph model with EDVWs is very relevant in practice. For example, an e-commerce system can be modeled as a hypergraph with EDVWs where vertices and hyperedges respectively correspond to users and products, and EDVWs represent the quantity of a product bought by a user (Li et al, 2018). EDVWs can also be used to model the author positions in a co-authored manuscript (Chitra and Raphael, 2019), the probability of a pixel belonging to a segment in image segmentation (Ding and Yilmaz, 2010), and the relevance of a word to a document in text mining (Hayashi et al, 2020;Zhu et al, 2021), to name a few.
Contributions In this paper, we propose a new class of splitting functions that we call EDVWs-based. In an EDVWs-based splitting function, the splitting penalty w e (S) depends only on the sum of EDVWs in S, namely v∈S γ e (v). Hence, we can write w e (S) = g e ( v∈S γ e (v)) for some continuous function g e . We prove that w e is submodular if g e is concave. The submodularity is necessary for graph reducibility. We study the EDVWs-based counterparts of four cardinality-based splitting functions in existing work and show that they are graph reducible. Moreover, we prove that any EDVWs-based splitting function with a concave g e is graph reducible and provide a way for such a reduction. We also show that the sparsification technique proposed in (Benson et al, 2020) can be easily adapted to the EDVWs-based case. The size and the density of the reduced graph depend on both the shape of g e and the EDVWs' values. In a nutshell, our paper provides a framework to study hypergraph cut problems incorporating EDVWs and generalizes the results presented in (Benson et al, 2020;Veldt et al, 2020a) from cardinality-based splitting functions to EDVWs-based ones.
Paper outline The rest of this paper is structured as follows. Preliminary concepts and related work about graph and hypergraph cut problems are reviewed in Section 2. The main theoretical results are presented in Section 3, where the hypergraph model with EDVWs is introduced in Section 3.1, the proposed EDVWs-based splitting functions are studied in Section 3.2, the graph reducibility results are stated in Section 3.3, and the sparsification technique is discussed in Section 3.4. The numerical results shown in Section 4 validate the effectiveness of introducing EDVWs into hypergraph cuts. Closing remarks are included in Section 5.
2 Preliminaries and related work 2.1 Graph cuts Let G = (V, E, W) denote a weighted and possibly directed graph where V is the vertex set, E is the edge set, and W is the weighted adjacency matrix whose entry W uv denotes the weight of the edge from u to v. A cut is a partition of the vertex set V into two disjoint, non-empty subsets denoted by S and its complement V \ S. The weight of the cut is defined as cut G (S) = u∈S,v∈V\S W uv . (1) Given two vertices s, t in the graph, the minimum s-t cut problem aims to find the minimum weight cut that separates s and t. Formally, the problem can be written as The minimum s-t cut problem is the dual of the maximum s-t flow problem. There exist a number of algorithms for the min-cut/max-flow problem and a summary can be found in (Goldberg, 1998). Moreover, it is established in (Orlin, 2013) that the min-cut/max-flow problem is solvable in O(|V||E|) time. We also notice that a recent paper (Chen et al, 2022) provides an algorithm that solves this problem in almost-linear time.

Hypergraph cuts
Let H = (V, E) be a hypergraph where V and E respectively denote the vertex set and the hyperedge set. Unlike the graph case, a hyperedge can connect more than two vertices thus there may exist multiple ways to split a hyperedge. For each hyperedge e ∈ E, we introduce a splitting function w e : 2 e → R ≥0 that assigns a non-negative penalty to every possible cut of e (Li and Milenkovic, 2017;Veldt et al, 2020a). The splitting function satisfies w e (∅) = w e (e) = 0, in other words, a penalty of zero is assigned when the hyperedge is not cut. Moreover, the splitting function is symmetric if it satisfies w e (S) = w e (e \ S) for any S ⊆ e. The weight of the hypergraph cut induced by S ⊆ V is defined as the sum of splitting penalties associated with every hyperedge, i.e., cut H (S) = e∈E w e (S ∩ e). ( There are mainly two types of splitting functions in existing work: (i) An allor-nothing splitting function assigns the same penalty to every possible cut of the hyperedge regardless of how its vertices are separated (Hein et al, 2013). More precisely, w e (S) is equal to some positive constant, e.g., the hyperedge weight, for all non-empty S ⊂ e and w e (S) = 0 if S ∈ {∅, e}. (ii) A splitting function is cardinality-based if w e (S 1 ) = w e (S 2 ) for all S 1 , S 2 ⊆ e whenever |S 1 | = |S 2 | (Veldt et al, 2020a). In other words, the value of w e (S) depends only on the cardinality of S. Several examples are given in Table 1.
Similar to the graph case, the hypergraph minimum s-t cut problem is formulated as For a finite set S, a set function F : 2 S → R is called submodular if for every S 1 ⊆ S 2 ⊂ S and every v ∈ S \ S 2 (Bach, 2013). If the splitting function w e associated with every hyperedge e ∈ E is submodular, the resulting hypergraph cut in the form of a sum of submodular functions is also submodular. In this case, the hypergraph minimum s-t cut problem can be solved using general submodular function minimizers (Grötschel et al, 1981;Iwata, 2003;Iwata and Orlin, 2009;Iwata et al, 2001;Orlin, 2009;Schrijver, 2000) or minimizers for decomposable submodular functions (Ene et al, 2017;Kolmogorov, 2012;Li and Milenkovic, 2018a). The allor-nothing splitting function and the cardinality-based ones listed in Table 1 are all submodular.

Graph reducibility
Since algorithms for the graph minimum s-t cut problem are more efficient than algorithms for general submodular function minimization, another way of solving hypergraph cut problems is to reduce the hypergraph to a graph that shares the same or similar cut properties (Ihler et al, 1993;Lawler, 1973;Li and Milenkovic, 2017). The reduction is generally accomplished via hyperedge expansions. Recently, a generalized hyperedge expansion has been formulated in (Veldt et al, 2020a), which projects a hyperedge onto a graph allowing directed edges and additional vertices. The formal definition is given below.
Definition 1 (Gadget splitting function (Veldt et al, 2020a)) A gadget associated with a hyperedge e is a weighted and possibly directed graph G e = (V , E ) with vertex set V = e ∪V whereV is a set of auxiliary vertices. The corresponding gadget splitting functionŵ e : 2 e → R ≥0 is defined aŝ w e (S) = min T ⊆V ,T ∩e=S cut Ge (T ).
A hyperedge splitting function is graph reducible if it is identical to some gadget splitting function. A hypergraph cut function defined as (3) or the hypergraph minimum s-t cut problem is graph reducible if all its hyperedge splitting functions are graph reducible. It has been proved in (Veldt et al, 2020a) that every gadget splitting functionŵ e defined as (6) is submodular. Hence, if a hyperedge splitting function is graph reducible, it must be submodular.
In the following, we give several examples of splitting functions that have been shown to be graph reducible in existing works. The all-or-nothing splitting function is graph reducible and can be constructed from the Lawler gadget described as follows.
Lawler gadget (Lawler, 1973). The Lawler gadget replaces a hyperedge e with a digraph defined on the vertex set V = e ∪ {e , e } where e , e are two auxiliary vertices. For each v ∈ e, add a directed edge of weight infinity from v to e and a directed edge of weight infinity from e to v. Finally, add a directed edge of weight equal to the hyperedge weight from e to e .
The cardinality-based splitting functions listed in Table 1 are all graph reducible and correspond to the following gadgets, respectively.
Clique gadget (Agarwal et al, 2006). This gadget is an (undirected) clique graph with vertex set V = e. For every u, v ∈ e, add an edge of weight 1 between u and v. Table 1: Examples of cardinality-based and EDVWs-based splitting functions and their corresponding gadgets where S is a subset of the hyperedge e and a, b are positive constants.
Symmetric cardinality-based gadget (Veldt et al, 2020a). Similar to the Lawler gadget, this gadget is a digraph with vertex set V = e ∪ {e , e }. For each v ∈ e, add a directed edge of weight 1 from v to e and a directed edge of weight 1 from e to v. Moreover, add a directed edge of weight b ∈ N from e to e .
Asymmetric cardinality-based gadget (Veldt et al, 2020a). This gadget is a digraph defined on V = e ∪ {v e }. For each v ∈ e, add a directed edge of weight a from v to v e and a directed edge of weight b from v e to v.

The hypergraph model with EDVWs
In this paper, we consider the hypergraph model with EDVWs as defined next.
Definition 2 (Hypergraph with EDVWs (Chitra and Raphael, 2019)) Let H = (V, E, κ, {γ e }) be a hypergraph with EDVWs where V and E respectively denote the vertex set and the hyperedge set. The function κ : E → R + assigns positive weights to hyperedges and those weights reflect the strength of connection. Each hyperedge e ∈ E is associated with a function γ e : e → R + to assign EDVWs. For convenience, we define γ e (S) = v∈S γ e (v) for any S ⊆ e.
The introduction of EDVWs enables the hypergraph to model the cases when the vertices in the same hyperedge contribute differently to this hyperedge. For example, in a coauthorship network, every author (vertex) in general has a different degree of contribution to a paper (hyperedge), usually reflected by the order of the authors. This information is lost in traditional hypergraph models but it can be easily encoded through EDVWs. In the following, we study how to incorporate EDVWs into hypergraph cut problems.

EDVWs-based splitting functions
A natural extension from cardinality-based splitting functions to EDVWs-based ones is to make the splitting penalty w e (S) dependent only on the sum of EDVWs in S.
For trivial EDVWs, namely γ e (v) = 1 for all v ∈ e, we have γ e (S) = |S| and the EDVWs-based splitting function reduces to a cardinality-based one. Actually, g e can be viewed as a continuous extension of the splitting function w e . In practice, we can also incorporate the hyperedge weight κ(e) into the splitting function such as setting w e (S) = κ(e) · g e (γ e (S)). This does not influence the results presented in this paper.
We are interested in submodular splitting functions which make it possible to leverage existing solvers for submodular function minimization. Moreover, as mentioned in Section 2.3, submodularity is a necessary condition for a splitting function to be graph reducible (Veldt et al, 2020a). In the following Theorem 3.2, we show that the EDVWs-based splitting function defined as (7) is submodular if g e is concave. We first present several properties of concave functions which will be used later.
Lemma 3.1 (Properties of concave functions) For a concave function g, we have holds.
Proof According to the definition of concave functions, the following inequality holds for any x and y in the domain of a concave function g, To prove Property (i), we set x = b 1 and y = b 2 +a. For t = b2−b1 b2−b1+a and t = a b2−b1+a , the above inequality respectively becomes Property (i) can be obtained by respectively adding both sides of these two inequalities together. Property (ii) can be proved by setting Theorem 3.2 For EDVWs-based splitting functions defined as (7), if g e is a concave function, then w e is submodular. If g e is concave as well as symmetric with respect to γ e (e)/2, then w e is submodular and symmetric.
It immediately follows from (7) that Hence, w e is submodular according to the definition of submodular functions. Moreover, it is straightforward to show that w e satisfies w e (S) = w e (e \ S) for all S ⊆ e if g e is symmetric with respect to γ e (e)/2.

Graph reducibility of EDVWs-based splitting functions
We first consider the following concave functions for g e .
By substituting these concave functions into (7), we obtain the EDVWs-based splitting functions listed in Table 1, which are submodular according to Theorem 3.2. The first three are also symmetric while the last one is generally asymmetric unless a = b. For the third one, if the parameter b is set to some value no greater than min v∈e γ e (v), it reduces to the all-or-nothing splitting function; while if b ≥ max S⊆e min{γ e (S), γ e (e) − γ e (S)}, it reduces to the second one. We are going to show that the EDVWs-based splitting functions listed in Table 1 are also graph reducible by constructing gadgets that generalize the gadgets introduced in Section 2.3. An illustration of these generalized gadgets is given in Figure 1.
The EDVWs-based splitting functions listed in Table 1 are all graph reducible and respectively correspond to the gadgets described in Table 2.
Proof In the following, we prove these four cases one by one.  v3 γe (v1) γe (v2) γe (v3) γe (v1) γe(v2) result in cut Ge (T ) equal to γ e (S), γ e (e \ S), b and γ e (e). Notice that the last one is 234 always no less than the first two. The result follows by taking the minimum one.

235
(iv) To split S from e in the asymmetric EDVWs-based gadget, we can set T = S 236 or T = S ∪ {v e } which respectively lead to cut Ge (T ) equal to aγ e (S) and bγ e (e \ S).

237
The proof is completed by taking the minimum one.

238
It has been proved in (Veldt et al, 2020a) that all submodular cardinality-based splitting functions are graph reducible. More precisely, any submodular cardinalitybased splitting function can be constructed from a combination of up to |e|−1 different asymmetric cardinality-based gadgets. For a symmetric submodular cardinalitybased splitting function, it can also be constructed from a combination of up to ⌊|e|/2⌋ symmetric cardinality-based gadgets. In Theorem 3.4 below, we show graph reducibility for the proposed submodular EDVWs-based splitting functions. To this end, we first define two sets as follows.
Denote by S i the subset of e corresponding to the ith smallest element in Q a . The 239 set Q s is a subset of Q a and contains the smallest |Q s | elements in Q a . (iv) To split S from e in the asymmetric EDVWs-based gadget, we can set T = S or T = S ∪ {v e } which respectively lead to cut Ge (T ) equal to aγ e (S) and bγ e (e \ S). The proof is completed by taking the minimum one.
It has been proved in (Veldt et al, 2020a) that all submodular cardinality-based splitting functions are graph reducible. More precisely, any submodular cardinalitybased splitting function can be constructed from a combination of up to |e|−1 different asymmetric cardinality-based gadgets. For a symmetric submodular cardinalitybased splitting function, it can also be constructed from a combination of up to |e|/2 symmetric cardinality-based gadgets. In Theorem 3.4 below, we show graph reducibility for the proposed submodular EDVWs-based splitting functions. To this end, we first define two sets as follows.
Denote by S i the subset of e corresponding to the ith smallest element in Q a . The set Q s is a subset of Q a and contains the smallest |Q s | elements in Q a .
Theorem 3.4 All EDVWs-based splitting functions defined as (7) with concave g e are graph reducible. A hyperedge paired with such a splitting function can be reduced to a graph which is a combination of at most |Q a | asymmetric EDVWsbased gadgets. If, in addition, g e is symmetric, the hyperedge can also be reduced to a graph combining at most |Q s | symmetric EDVWs-based gadgets.
Proof We first consider the case when g e is concave and symmetric, then study the more general case when g e is concave and possibly asymmetric.
(i) Consider a hyperedge e and an EDVWs-based splitting function w e with concave, symmetric g e . We are going to show that w e corresponds to some gadget splitting functionŵ e and one way for constructing such a gadget is to combine r = |Q s | symmetric EDVWs-based gadgets. For the ith symmetric EDVWs-based gadget, denote the two auxiliary vertices by e i and e i , set the weight of the edge from e i to e i to b i = γ e (S i ), then scale all edge weights by a factor a i ≥ 0. The combined gadget contains |e| + 2|Q s | vertices and (2|e| + 1)|Q s | edges. Its corresponding splitting function can be written aŝ By substituting S i into (14) we get which can be condensed in the following matrix form The question left is whether there exist non-negative a 1 , · · · , a r such thatŵ e (S i ) = w e (S i ) for all i ∈ [r]. To identify such a i , we replaceŵ e (S i ) with w e (S i ) and invert the system (15) as follows Since a 1 , · · · , a r need to be non-negative, we are left to prove the following inequalities: w e (S r ) ≥ w e (S r−1 ).
Moreover, it can be observed from (15) thatŵ e (S 1 ) ≤ŵ e (S 2 ) ≤ · · · ≤ŵ e (S r ) due to the structure of B 1 and the non-negativity of coefficients a i . Hence, we also need to show that w e (S 1 ) ≤ w e (S 2 ) ≤ · · · ≤ w e (S r ).
By introducing b 0 = 0, the inequalities (16), (17) can be rewritten as It follows immediately from (ii) in Lemma 3.1. The inequalities (19) (including (18)) can be rewritten as which can be proved according to (iii) in Lemma 3.1. Notice that, when there exists any equality in (16)-(18), the corresponding a i equals 0, which implies that the number of symmetric EDVWs-based gadgets needed to construct the combined gadget can be further reduced. The equality in (17) (or see (20)) means that the points at b i−1 , b i and b i+1 are colinear.
(ii) Next we consider the case when g e is concave and possibly asymmetric. In this case, w e can be shown to be identical to some gadget splitting functionŵ e and such a gadget can be constructed by combing r = |Q a | asymmetric EDVWsbased gadgets. The ith one is paired with parameters a i (γ e (e) − b i ) and a i b i where a i ≥ 0 is a scaling parameter. The combined gadget consists of |e| + |Q a | vertices and 2|e| · |Q a | edges. Its corresponding splitting function can be written aŝ We set b i = γ e (S i ) for all i ∈ [r]. Then it holds that b i + b r+1−i = γ e (e). By substituting S i into (22) we get which can also be written in the following matrix from Replaceŵ(S i ) with w(S i ) and invert the system (23) to find the valid coefficients a i . For convenience, we introduce b r+1 = γ e (e). The inverse of B 2 can be written as Notice that B −1 2 has the same structure as B −1 1 except for the last element as well as the scaling coefficient 1/b r+1 . Since a 1 , · · · , a r need to be non-negative, we are left to prove the following inequalities: w e (S r ) ≥ br+1−br br+1−br−1 w e (S r−1 ).
By introducing b 0 = 0, the above inequalities can be rewritten and summarized as which immediately follows (ii) in Lemma 3.1.
For the two cases discussed in Theorem 3.4, the required number of building gadgets |Q s | and |Q a | are respectively upper bounded by 2 |e|−1 − 1 and 2 |e| − 2. For trivial EDVWs, the theorem coincides with the results for cardinality-based splitting functions in (Veldt et al, 2020a) with |Q s | and |Q a | respectively reducing to |e|/2 and |e|−1. Hence, EDVWs-based splitting functions generally lead to a denser graph than cardinality-based splitting functions in the worst case. In addition, when g e is concave as well as symmetric, Theorem 3.4 provides two ways for the reduction while the one leveraging symmetric EDVWs-based gadgets requires fewer edges.

Sparsifying hypergraph-to-graph reductions
Graph min-cut/max-flow algorithms have a complexity that depends on the number of vertices and the number of edges in the graph (Goldberg, 1998). The reduction procedures discussed above may result in large and dense graphs, thus affecting the efficiency of algorithms applied to the reduced graph. A workaround is to find a smaller and sparser graph whose cut approximates, rather than exactly recovers, the hypergraph cut (Bansal et al, 2019;Benczúr and Karger, 1996;Benson et al, 2020;Chekuri and Xu, 2018;Kogan and Krauthgamer, 2015). In (Benson et al, 2020), a sparsification technique is proposed for hypergraphs with submodular cardinalitybased splitting functions. It is based on approximating concave functions using piecewise linear curves and can be generalized to our EDVWs-based case due to the formulation (7). In the following, we discuss this in detail.
As shown in (3), the hypergraph cut function is defined as the sum of splitting functions associated with every hyperedge. Hence, the problem can be decomposed into approximately modeling each hyperedge using a sparse gadget with a smaller set of auxiliary vertices. More formally, we would like to find such a gadget whose corresponding splitting functionŵ e as defined in (6) approximates the splitting penalties associated with a hyperedge, i.e., w e (S) ≤ŵ e (S) ≤ (1 + )w e (S), ∀S ⊆ e, where ≥ 0 is an approximation tolerance parameter. This is equivalent to finding some gadget splitting functionw e satisfying 1 δ w e (S) ≤w e (S) ≤ δw e (S) with the correspondenceŵ e (S) = δw e (S) and = δ 2 − 1.
We first consider EDVWs-based splitting functions with concave and symmetric g e . We have shown in Theorem 3.4 that these splitting functions can be exactly constructed from a combination of |Q s | symmetric EDVWs-based gadgets. Following the idea in (Benson et al, 2020), we next show how to approximate these splitting functions using a smaller set of symmetric EDVWs-based gadgets. Recall from the proof of Theorem 3.4 that the combination of r symmetric EDVWs-based gadgets respectively with positive parameters b 1 < · · · < b r and combination coefficients a 1 , · · · , a r has the splitting function in the form of (14). Its continuous extension can be written as follows. Since it is symmetric, we can just consider the first half of the function When x ≤ b 1 , (29) can be rewritten asĝ e (x) = ( r i=1 a i ) · x; when x > b r , we haveĝ e (x) = r i=1 a i b i ; when b i−1 < x ≤ b i for any i = 2, · · · , r, we haveĝ e (x) = i−1 j=1 a j b j + ( r j=i a j ) · x. Hence,ĝ e (x) can also be characterized as the lower envelope of a set of r + 1 linear functions having non-negative decreasing slopes and non-negative increasing intercepts (Benson et al, 2020), i.e., Equivalently, the relations between the coefficients a i , b i and m i , d i can also be described as The sparsification problem can be described as: Find the piecewise linear function g e with the minimum number of pieces that approximates g e at the points in Q s . It can be formulated as follows.
min r (31) is defined as (30), The first constraint is from (28). The third constraint is added without loss of generality since an improved approximation could be found if there is some f i strictly greater than g e at all points in {0} ∪ Q s . The main difference with the corresponding formulation in (Benson et al, 2020) is that, for the cardinality-based case there considered, the set Q s consists of only integers from 1 to |e|/2 ; for the EDVWs-based case, the elements in Q s depend on the values of EDVWs thus they are not necessarily integers or evenly spaced. We can modify the algorithm proposed in (Benson et al, 2020) to find an optimal solution to (31). Here we briefly introduce the procedure and please refer to (Benson et al, 2020) for a detailed explanation. For convenience, we denote the elements in Q s by q 1 < q 2 < · · · < q n where n = |Q s | and define a series of functions with the ith one joining (q i , g e (q i )) and (q i+1 , g e (q i+1 )), i.e., h i (x) = ge(qi+1)−ge(qi) qi+1−qi (x − q i ) + g e (q i ). The last linear piece f r+1 is first determined. According to (30) and (31), it has a zero slope and passes through (q n , g e (q n )), thus we have f r+1 (x) = g e (q n ). For the first linear piece f 1 , it goes through the origin according to (30) and its slope m 1 is set to ge(q1) q1 so that it can provide a qualified approximation for as many points in Q s as possible. Then we identify the first point q in Q s that does not have a (1 + )-approximation. In other words, f 1 (q i ) ≤ (1 + )g e (q i ) holds for i = 1, · · · , −1 and becomes invalid since i = . We stop searching more linear pieces if (i) ≥ n, or (ii) g e (q n ) ≤ (1 + )g e (q ) where (ii) implies that q , · · · , q n have a (1 + )-approximation provided by the last linear piece. Otherwise, we continue to find the next linear piece f 2 in order to cover the rest of the points. We identify i * such that h i * −1 (q ) ≤ (1 + )g e (q ) and h i * (q ) > (1 + )g e (q ). In other words, the linear function h i * −1 provides a qualified approximation at q (i.e., the first point has not been covered yet), but the following functions h i for i ≥ i * do not. Hence, f 2 should pass through the point (q i * , g e (q i * )) and lie between h i * −1 and h i * (its slope should be greater than h i * 's slope and no greater than h i * −1 's slope). In the meantime, we expect f 2 to provide a qualified approximation at q and have a slope as small as possible so that more points after q i * can be covered. Therefore, we set f 2 to the line joining (q , (1 + )g e (q )) and (q i * , g e (q i * )). We refer the reader to Figure 4 in (Benson et al, 2020) for an illustration. Then we update q to the new start point in Q s that does not have a (1 + )-approximation yet, and check the stopping criterion. We repeat the process of picking f 2 to add more linear pieces until we meet the stopping criterion. We can also ignore the particular positions of elements in Q s and try to provide a (1 + )-approximation for g e (x) everywhere in the range [0, γ e (e)/2]. This approach has two benefits: (i) It avoids building the set Q s . (ii) If multiple hyperedges share the same continuous extension g e , we can find their sparsified reductions all at once. The procedure of finding the set of linear pieces can be modified as follows. The last linear piece should be f r+1 (x) = g e (γ e (e)/2). The first linear piece f 1 is tangent to g e at the origin. We identify the value z ∈ [0, γ e (e)/2] that satisfies f 1 (z) = (1+ )g e (z). If no such a z exits or g e (γ e (e)/2) ≤ (1 + )g e (z), we stop. Otherwise, we add the next linear piece f 2 which is selected to pass through (z, (1+ )g e (z)) and be tangent to g e as some point greater than z. We repeat the process of choosing f 2 to add more linear pieces until the whole range [0, γ e (e)/2] has been covered. An illustrative example is given in Figure 2.
For EDVWs-based splitting functions with concave and possibly asymmetric g e , they can be approximated using a smaller set of asymmetric EDVWs-based gadgets. It has been proved in (Benson et al, 2020) that the continuous extension of the splitting function in the form of (22) is also piecewise linear, thus we can adopt a similar reduction procedure as described above. Moreover, a direct extension of Theorem 4.1 in (Benson et al, 2020) is that the number of symmetric/asymmetric EDVWs-based gadgets needed to approximate an EDVWs-based splitting function with concave, symmetric/asymmetric g e is upper bounded by O(log 1+ γ e (e)) which behaves as −1 log γ e (e) as approaches zero. This bound could be further improved if a specific concave function g e is chosen.

Experiments
We show the effects of introducing EDVWs into hypergraph cuts via numerical experiments. We consider the binary classification of hypergraph vertices: Given the labels of a subset of vertices V L = V L 1 ∪ V L 2 where V L 1 and V L 2 respectively consist of all labeled vertices in two classes, the task is to estimate the labels of the rest of the vertices. The problem can be formulated as a generalized hypergraph minimum s-t cut problem with multiple source and sink vertices, i.e., Then the vertices in S and V \ S are classified into two categories, respectively. We consider the first three EDVWs-based splitting functions listed in Table 1, namely w e (S) = γ e (S) · γ e (e \ S), w e (S) = min{γ e (S), γ e (e \ S)} and w e (S) = min{γ e (S), γ e (e \ S), b}, which are symmetric and graph reducible. For any of them, we can reduce the hypergraph to a graph sharing the same cut properties. Following the idea in (Blum and Chawla, 2001), we introduce another two vertices -a supersource s and a super-sink t -into the reduced graph. For every v ∈ V L 1 , add an edge of weight infinity from s to v; for every v ∈ V L 2 , add an edge of weight infinity from v to t. Then problem (32) can be converted into a common minimum s-t cut problem defined on the new graph in the form of (2). We solve the graph minimum s-t cut problem using the highest-label preflow-push algorithm implemented in the NetworkX package (Hagberg et al, 2008). The red curves (cardinality-based) and the green curve (all-or-nothing) respectively correspond to the cases when α = 0 and when β is small enough (β = 10 −3.5 here). For the blue curves (EDVWs-based), a 5-fold cross-validation is adopted in (a-c) to search the optimal α and in (d) to search the optimal β.
We adopt the 20 Newsgroups dataset and consider the task of document classification. The dataset contains documents in different categories and we consider the documents in categories "rec.motorcycles" and "sci.space". We extract the 200 most frequent words in the corpus after removing stop words and words appearing in > 3% and < 0.2% of the documents. We then remove a small fraction of documents that do not contain the selected words and finally get 1852 documents with 932 and 920 documents in two classes, respectively. To model the text dataset using hypergraphs with EDVWs, we consider documents as vertices and words as hyperedges. A document (vertex) belongs to a word (hyperedge) if the word appears in the document. The EDVWs are taken as the corresponding tf-idf (term frequency-inverse document frequency) values (Leskovec et al, 2020) to the power of α, where α is a tunable parameter. More precisely, we set γ e (v) = tf-idf(e, v) α , where tf-idf(e, v) = tf(e, v) · idf(e).
The term frequency tf(e, v) is the relative frequency of word e in document v. The inverse document frequency idf(e) measures the informativeness of word e, i.e., if it is common or rare across all documents. Hence, the tf-idf values are able to reflect the importance of a word to a document in a corpus and thus an ideal choice for EDVWs. We adopt the TfidfTransformer function in the scikit-learn package with default parameters to compute the tf-idf values. The parameter α is introduced for extra flexibility. When α = 0, we get the trivial EDVWs and the splitting functions reduce to cardinality-based ones. For the splitting function w e (S) = min{γ e (S), γ e (e\S), b}, we set b = βγ e (e) where β is also adjustable. If a small enough β is selected, the splitting function reduces to the all-or-nothing case. Figure 3 shows the effects of the parameters α and β on the classification performance. We plot the average classification accuracy and the standard deviation over 10 realizations which adopt different sets of labeled vertices. We respectively set the fraction of labeled vertices to 0.3 in (a-d) and 0.5 in (e-h). The three considered EDVWs-based splitting functions respectively correspond to (a) (e), (b) (f), and (c-d) (g-h). For the third splitting function, we fix β = 0.15 to observe the influence of α in (c) (g) and fix α = 1 to test β in (d) (h). It can be observed that, for all of them, the best performance is achieved for intermediate values of α or β rather than the extreme cases when the EDVWs-based splitting function reduces to a cardinality-based splitting function or the all-or-nothing splitting function. Figure 4 provides a more direct comparison between the proposed EDVWs-based splitting functions and existing ones. For EDVWs-based splitting functions, we adopt a 5-fold cross-validation to find the optimal α and β. In (a), we search the optimal α over the set {0 : 0.2 : 3}. In (b-c), we search α over {0 : 0.2 : 5}. In (d), we search β over 20 equally spaced values between 10 −3.5 and 10 −1/3 in the log scale. We can see that adopting EDVWs-based splitting functions improves the classification performance over a wide range of train-test split ratios.

Conclusion
We developed a framework for incorporating EDVWs into hypergraph cut problems and generalized reduction as well as sparsification techniques recently proposed for cardinality-based splitting functions. Through a real-world text mining application, we showcased the value of the introduction of EDVWs. There are numerous directions for future work: (i) As mentioned in the introduction, hypergraph minimum cuts can be used to solve various real-world applications (Catalyurek and Aykanat, 1999;Ding and Yilmaz, 2008;Karypis et al, 1999;Kim et al, 2011) or as subroutines in many machine learning algorithms (Liu et al, 2021;Veldt et al, 2020b). Hence, it would be desirable to apply the proposed framework to these applications and algorithms and evaluate the performance. (ii) Another direction is to further extend the proposed framework to multiway cuts (Chekuri and Ene, 2011;Okumoto et al, 2012;Veldt et al, 2020a;Zhao et al, 2005) or other types of cuts such as normalized cuts (Fountoulakis et al, 2021;Milenkovic, 2017, 2018b). (iii) An open problem is whether all submodular splitting functions are graph reducible (Veldt et al, 2020a).