Measurement and refactoring for package structure based on complex network

Software structure is the backbone for software systems. During the long time of software evolution, it is gradually weakened by continuous code modification and expansion driven by new requirements. Therefore, measuring software and refactoring codes are necessary to keep software structure stable and clean. In this paper, we propose two metrics of cohesion and coupling to characterize package structure. We consider not only the dependencies of intra-package and inter-package, but also the backward dependencies of inter-package. The two metrics are proved theoretically that they are satisfied with Briand’s four properties. Based on these metrics, a refactoring algorithm is presented to improve the quality of package structure. Through tests on ten open source software systems, the experiment result shows our metrics can measure software structure correctly and improve codes to fit for the rule of high cohesion and low coupling.


Introduction
It is well known that software lifecycle has two phases: a development phase and a maintenance phase. In the development phase, programmers make codes carefully under the guidelines of software architecture design, such as the rule of high cohesion and low coupling. Comparing to the development phase, the maintenance phase is much longer and can last for several years. During the long time of maintenance, the software is not stationary, but evolves gradually. Driven by the new requirements, software functionalities are continuously updated and refactored. Therefore, the amount of code increases and it also becomes more and more complicated. This may cause the software to deviate from the original design, and result in the degradation of software quality and comprehensibility, and finally generate a "technical debt" 1 . So it is necessary to keep software architecture stable and codes clean during the evolution to prolong software service life (Tom et al. 2013).
Faced with the increasing software functionality and complexity, it needs careful protection on software architecture without function degradation. Refactoring, is one of powerful tools to improve software design and increase maintainability and usability for software systems (Fowler 1997). Through code modification and software structure adjustment, refactoring makes software clean, which can pass code review and get good software measurement. However, simple refactoring with code modification in manual, is time consuming and has little effort. Thus, some researchers are looking for guidelines for automatically refactoring based on software metrics. On the other side, there are some researchers in software engineering focusing on the dynamic characteristics of software structure with methods in complex network. Based on the combination of complex network and software engineering, a software system can be represented into a network, then transferred into a unified object for evolution analysis.
In this paper, based on complex network theory, we present two metrics about package cohesion and coupling, to measure software quality better and guide refactoring automatically. Compared with previous work (Mi et al. 2019), the new metrics take into account overall dependencies between classes, and also consider the backwards dependencies of classes. To check the validity of our metrics, we first prove they strictly meet the four properties proposed by Briand (Briand et al. 1996). Based on these two metrics, we also provide a refactoring algorithm to adapt package-class relations for better balance of cohesion and coupling. Finally, through several experiments on multiple open source software systems, we verify the validity of our metrics, and efficiency of the refactoring algorithm.
We have presented a preliminary version of software measurement and refactoring in (Mi et al. 2019). Beside a general revision and improvement, this paper extends our previous work in the following directions: • Besides the cohesion metric, we also present the coupling metric. The combination of cohesion and coupling can measure software package structure more objective and clear. We also prove the new coupling metric is strictly satisfied with the properties proposed by Briand (Briand et al. 1996).
• Cohesion and coupling are correlated but not overlapped. Bias to any one metric is not good to measure software correctly. Based on the relation of cohesion and coupling metrics, we present an evaluation model of package structure, then update the refactoring algorithm.
• We compare our new refactoring algorithm with other algorithms. In the experiment of disturb and recover, under different disturb ratios, our algorithm can find almost disturbed classes and place them back to the correct packages.
The remainder of this paper is structured as follows. In "Related work" section, we describe the current work of software measurement and complex network in software engineering. In "Fundmentals of software codes" section, we introduce some concepts of software codes and code dependencies. In "Software network and its attributes" section, we present the construction of software network and its related attributes. In "Our metrics" section, we describe the new metrics of cohesion and coupling, and prove them validity in theory, then give a new algorithm for package refactoring. In "Experiments and analysis" section, we design several experiments to verify the validity of our metrics and the efficiency of refactoring algorithm. Finally, we conclude our work in "Conclusion" section.

Related work
A software system can be evaluated "GOOD", with not only satisfying the functionality requirements, but also meeting the design and programming guidelines. The typical one is high cohesion and low coupling. In these guidelines, many metrics have been developed to measure software quality. These metrics have different levels of granularity, from the basic program variable to the whole architecture structure. According to the implementation, these metrics can be divided into two categories: statistics-type and network-type.
We first review the statistics-type metrics, which are calculated based on different levels of programming objects, such as class, package and system. we list as follows.
(a). For the class level, Chidamber and Kemerer proposed the CK metric set for OO design evaluation (Chidamber and Kemerer 1994). CK is a group of six metrics on class and method, which are pure static statistics. Lee et al. proposed a dynamic metric of information-based coupling (ICP), which defines the coupling degree for every class based on information flow through method invocations (Lee 1995). Harrison et al. pro-posed MOOD metric set, including CF (coupling factor) (Harrison et al. 1998), which can measure cohesion indirectly through measuring coupling. It is calculated by the sum of all possible dependencies between classes, divided by the sum of existed dependencies of all classes. Bieman et al. proposed two cohesion metrics: TCC (Tight Class Cohesion) and LCC (Loose Class Cohesion), based on the instance variables shared in different methods (Bieman and Kang 1995).
(b). For the package level, Martin proposed seven statistical metrics (Martin 2002), which are easily implemented and widely used in many development tools to assist programmers. Misic studied on the package cohesion, and concluded that relying solely on the internal relationships of packages is not sufficient to determine cohesion (Misic 2001). Sarkar et al. proposed a new metric suite to characterize the modular quality of software packages (Sarkar et al. 2008). Abdeen et al. proposed a cohesive metric based cyclic dependence (Abdeen et al. 2009), to evaluate package modularity, encapsulation, variability, and reusability.
(c). For the system level, Gui et al. used an approach like (Lee 1995) to define system coupling as the average coupling of all classes in a software system (Gui and Scott 2006). This is a system-level coupling measurement that can be used to evaluate component reusability.
Next, we summarize the network-type metrics and their implementation. Compared to the common statistics methods, complex network technologies have become more and more popular since they provide a macro perspective to analyze software. Software systems can be represented as complex networks, which also called software networks. In a software network, nodes are software entities, such as methods, fields, classes, or packages, and edges represent dependencies between entities (Pan et al. 2011). Therefore, many recent studies have focused on software networks, and reached many useful results (Potanin et al. 2005;Concas et al. 2007;Pan et al. 2010;Pan et al. 2011).
One of the most interesting properties in complex network is community structure (Girvan and Newman 2002;Newman 2006;Fortunato 2010). Communities are generated by dividing a network into several parts, based on the rules of tight internal connections and sparse external connections. Communities play an important role in understanding network characteristics (Li et al. 2013) (Pan et al. 2011). Software networks have the same community properties with other networks, such as being small world and scale-free (Myers 2003;Pan et al. 2011). Shen et al. made in-depth research on several Java software systems and found that the relevant networks contain same features (Ping-ting and Liangyu 2017). Pan et al. represented Java software systems as bipartite networks, and proposed an algorithm to reconstruct the organizational structure of software packages (Pan et al. 2014). With the widespread application of complex network technology on software networks, many related tools are developed to analyze existed software systems. Zheng et al. used an improved degree model to analyze the Linux system (Zheng et al. 2008). Besides community structure analysis, complex network technologies have also been applied in the field of software evolution. Valverde et al. proposed a model based on node replication and edge rearrangement (Valverde and Solé 2005). He et al. proposed a model based on the growth of software design patterns ). Li et al. proposed a software evolution model combining complex network theory and evolutionary algorithms ).

Software codes
Software systems are made from codes written by professional programmers according to practical functionality requirements. These codes must obey the syntax rules and structure requirements of programming languages. Take Java as a typical language, the common program structure of object-oriented (OO) software systems is two-tier: class and package. All codes must be enclosed in mutiple class files, then these class files are collected in different packages according to their functionalities and roles. Class is the basic unit. And package, as an intermediate layer, can play the role of aggregating classes and regulating class access as well as can reduce system complexity and increase maintainability and understandability. Note that the rationality of package organization affects software quality to a certain degree.
For better demonstration, we show a Java system in Fig. 1. In this system, there are three packages. Detailedly, package1 has three classes: A, B, C; package2 has also three classes: D, E, F, and the last package package3 has three classes: H, I, J.

Code dependencies
When a call occurs between two classes, a dependency is created. There are several types of dependencies in oriented-object programming languages. Kang et al. summarized code dependencies between classes into ten relations (Kang et al. 2004). These relations have different weights in the classical theory of software engineering, however, to our best knowledge, there is no authoritatively quantitative values for weighting factors.
Additionally, the influence of packages on dependencies can not be omitted, since packages also contribute to system dependencies. As a middle tier, package can aggregate classes with same role or functionality, and limit outside illegal access. Thus, the dependencies can be labelled into two types: intra-package dependencies and inter-package dependencies. An intra-package dependency means its caller and callee are in the same package, while the two participators of inter-package dependency locate in different packages.

High cohesion and low coupling
Programming languages are continuously growing with more and more powerful features, such as function encapsulation, inheritance and polymorphism. These features make codes implement the requirements right and efficiently, while they have to cause the dependency problem (Tom et al. 2013). As shown in Fig. 1, codes are split into several class files, where they generate many dependencies among them. For example, class I inherits class E and calls the function of class F. That means, class I are depended on the classes E and F. If there are some changes in classes E or F, it must put impacts on class I. Unfortunately, the dependencies are inevitable since it cannot put all codes into one file.
From Fig. 1, we can see that a class has higher risk of unstable modification if it has more dependencies on other classes. Therefore, programmers try to get rid of class dependencies, and make codes self-contained. This is called the famous principle of high cohesion and low coupling. The cohesion indicates the degree of every program module, like class or package, can finish its functionality with the support from inner codes. Conversely, the coupling presents how a module depends on other outside modules. To avoid the cascade modification and latent bugs, one of the promising methods is to make codes high cohesion and low coupling. The ideal system is one where all modules remain independent without any dependencies. Unfortunately, that is very difficult because of the massive and complex software requirements. Therefore, programmers need to design and implement changes carefully to increase code cohesion and decrease module coupling.

Class dependency graph network
Definition 1 In this paper, the software systems we studied are made from orientedobject programming languages. Thus, based on the package-class structure and the dependencies between classes, the software system can be represented as a Class Dependency Graph (CDG) network (Ping-ting and Liang-yu 2017), which is a directed graph.

where V c is the set of vertexes/classes, E c is the set of edges/dependencies, and C is the set of communities/packages respectively. Every package is mapped to be a community of the network. In this directed network, there is an edge v i → v j if and only if there is at least one following dependency between v i and v j :
• R 1 -Inheritance and implementation: v i extends or implements v j ; • R 2 -Aggregation: v j is the data type of member variable in v i ; • R 3 -Parameter: v j is the data type of parameter/return value/declared exception of In other words, for a node, its outgoing edges denote the classes it depends on, that is, forwards dependencies. Similarly, the incoming edges mean the classes it supports, namely backwards dependencies. Additionally, for generality, we assume that the weights of above five dependencies are same, then the dependency between two classes is weighted by the add up of all the dependencies.
We use a software system developed with Java language as an example, to show how to construct a CDG network. Corresponding to the source codes shown in Fig. 1, we can generate the CDG shown in Fig. 2. Different to the existing coarse-granularity software networks, our CDG describes the software structure deeply and clearly, since it is based on five fine-granularity dependencies {R 1 , R 2 , . . . , R 5 }.
In Fig. 1, there are three packages. Detailedly, package1 has three classes: A, B, C; package2 has also three classes: D, E, F, and the last package package3 has two classes: H and I. It is easily observed that there are four dependencies between classes: D depends on A, F depends on D, I depends on E, and I depends on F . Based on the definitions of five dependencies {R 1 , R 2 , . . . , R 5 }, the CDG is generated and shown in Fig. 2.

Attributes of software network
Definition 2 Let G = (V , E, C) stand for a directed network, where V, E, C denote the set of vertices, edges and communities respectively. Note that for the networks generated from software systems, the communities are formed based on the package-class structure naturally. Each vertex belongs to only one community, and there is no common vertices between communities, that is, ∀i =j, C i ∩ C j =∅. m ij denotes the value between v i and v j in adjacency matrix M. And for two vertices v i , v j in a software network, if v i , v j belong to in the same community, then Definition 3 An internal edge is an edge whose two vertices are located in the same community. The number of internal edges in a community, is calculated with ( 1 )

Fig. 2 An Example of CDG
Corresponding to a software network, WPR indicates the cohesion maturity for a package, since the internal edges are located in a package, that is, the package don't need any outer dependencies. Obviously, the larger WPR is, the greater cohesion is.
Definition 4 An external edge is an edge whose two vertices are in two different packages. There are two types of external edges: outgoing edges and incoming edges. For a community, the number of outgoing edges is calculated with while the number of incoming edges is Corresponding to a software network, WPER means the "powerful" degree of a package which can support other packages. And WPAR is the "dependent" degree of a package which needs more support on other packages. Thus, for a package, the larger WPER is, the more important is; the larger WPAR is, the more dependent is.
Definition 5 We can quantify the importance for a package. Let PRE indicate the number of other packages that a package depends upon, and PRA denote the number of other packages that depend on a package. The related calculation is listed as follows: In formulas (4) and (5), when classes in C l depends upon classes in C k , γ (C l , C k ) = 1. Otherwise, γ (C l , C k ) = 0.

Our metrics
It is well known that the rule of high cohesion and low coupling is very important in software architecture design. The degree of cohesion and coupling between packages, has a great impact on software maintainability and reusability. However, manual evaluation for cohesion and coupling is time consuming and labor intensive. So it is necessary to construct evaluation metrics and algorithms without manual operations, for better code evaluation and refactoring.

Cohesion and coupling metrics
In (Abdeen et al. 2009), Abdeen proposed a cohesion metric packages. For a package, this metric considers not only the intra-package dependencies, but also the inter-package dependencies. However, it omits the backwards dependencies, namely the case that a class is dependent on others. From the perspective of software quality, the inter-package calls brought by the backwards dependencies, have a high probability of affecting overall package reusability. Considering the affect of backwards dependencies, we define a new cohesion metric COHM, for measuring software package cohesion.
When WPR + WPER + δ · WPAR = 0, COHM is set as 0. Though WPER and WPAR both denote inter-package dependencies, the influence of backwards dependencies on a class is smaller than the forwards dependencies' influence. Note that backwards dependencies are passive and not controlled by the callee class. However, a class can control its forwards dependencies. So, we multiply WPAR with an arbitrary coefficient δ less than 1 to emphasize that it's less important than WPER. Here, we tentatively fix δ = 0.5. According to the meaning of cohesion, it is easily known that the larger value of COHM indicates higher cohesion. Inspired by Martin's efferent and afferent couplings (Martin 2002), we also propose a new package coupling metric COUM. This metric considers the relations between packages caused by all relations between classes, which can truly reflect the hierarchical relations between packages. The COUM for one package is calculated as follows: where WPR represents the number of times the package depends on itself. Note that when the denominator is 0, COUM is set as 0. In this formula, the numerator denotes the sum of the number of associations between the community and other communities, and the denominator represents the sum of the number of all associations in the community. According to the meaning of coupling, the smaller the COUM value is, the lower the degree of package coupling is. We present Algorithm 1 to calculate the metrics of cohesion and coupling for a package. In this algorithm, N is the number of all classes and N p 1 denotes the number of classes in the package p 1 . We iterate all classes in the package p 1 to calculate cohesion and coupling. It is observed that we only consider the case that two classes have dependencies. Next, if the two classes are in the same package, we add up the dependency values from M to get the value of WPR, otherwise, we get the value of WPER or WPAR according to the direction of the dependency. Finally calculate cohesion and coupling for package p 1 after visit all classes.
Let's analyze the complexity of Algorithm 1. It is easily seen that there are a nested twolayers loop in Algorithm 1. Assume the number of classes of whole software is N, and for a package P 1 , it has N P 1 classes. Then the outer loops runs N P 1 times, while the inner loop runs N times. Therefore, the complexity of Algorithm 1 is O (N P 1 N). When performing Algorithm 1 on all packages, the total complexity is O((N P 1 + N P 2 + · · · + N P x ) · N). Since N P 1 + N P 2 + · · · + N P x = N, the total complexity is O(N 2 ). Remark that the total value of COHM and COUM for a software system, are set as the average values of all packages' counterparts.

Theoretical verification
The concept of cohesion and coupling has been used to represent the dependencies between modules. Briand et al. defined some mathematical properties to characterize the cohesion and coupling (Briand et al. 1996). Such a mathematical framework can generate a consensus in the software engineering community, provide better guidelines for communication among researchers, and better evaluation methods for commercial analyzers and practitioners. These properties are necessary and helpful to prove the usefulness of cohesion/coupling measurement although not completely precise.  (6); 27 calculate COUM with formula (7); 28 return the result. 29 Here, we use CDG in Fig. 2 as an example to illustrate the calculation of COHM and COUM. For package2, class F depends on class D in the same package, class D depends on class A in package1, class E is depended on class I in package3 and class F is depended on class I in package3. So, WPR = 1, WPER = 1, and WPAR = 3. Since package2 depends on package1 and package3 is depend on package2, we have PRE = 1, and PRA = 1. According to the formulas (6) and (7), COHM = 0.29, and COUM = 0.67.
Here, we verify theoretically the validation of our inter-package cohesion and coupling by analyzing their mathematical properties. Briand presented five properties for cohesion and coupling. The definitions are listed as follows. 2 2 For generality and refinement, we combine properties 4 and 5 in (Briand et al. 1996) into one property, namely PROPERTY 4 including two parts: cohesion and coupling.

PROPERTY 1: Non-negativity
The cohesion and the coupling of a modular system|modular is nonnegativity. PROPERTY 2: Null Value If there is no intramodule relationship among the elements of a (all) module(s), then the module (system) cohesion is null. And If there is no intermodule relationship among the elements of a (all) module(s), then the module (system) coupling is null. PROPERTY 3: Monotonicity Adding intramodule relationships does not decrease [module|modular system] cohesion. And adding intermodule relationships does not decrease [module|modular system] cohesion.

PROPERTY COHESION 4: Merging of Modules
The cohesion of a [modulelmodular system] obtained by putting together two unrelated modules is not greater than the [maximum cohesion of the two original modules I the cohesion of the original modular system].

PROPERTY COUPLING 4: Merging of Modules
The coupling of a [moduleImodular system] obtained by merging two modules is not greater than the [sum of the couplings of the two original moduleslcoupling of the original modular system], since the two modules may have common intermodule relationships. (6) and (7) are satisfied with four verification properties proposed by Briand. Proof (1) Non-negativity In formula (6), WPR, WPER, δ, WPAR are all non-negative, therefore COHM is also nonnegative. In formula (7), PRE, PRA, and WPR are all non-negative, thus COUM is also non-negative.

(2) Null Value
If the number of classes in a package is zero or the package has no relation with any other packages, that is, WPR, WPER, WPAR, PRE, PRA are all zero, then both denominator of COHM and COUM will be null.

(3) Monotonicity
There are two cases of adding new edges to a package. One is adding internal edges in a package, the other is linking external edges between different packages. For the first case, when adding some internal edges for a package C, we use C to denote the new package. Then for COHM monotonicity, we have since WPER C and WPAR C aren't changed under the condition of adding internal edges.
Obviously, WPR C > WPR C . So both denominator and numerator are non-negative, then COHM is increasing monotonously. For COUM monotonicity, we have since PRE C and PRA C aren't changed under the condition of adding internal edges. Obviously, WPR C > WPR C , then the denominator is non-negative and the numerator is negative, so COHM is decreasing monotonously. For the second case of adding external edges for a package C, let C denote the new package. As to COHM monotonicity, we have since WPR isn't changed under the condition of adding external edges. Obviously, WPER C ≥ WPER C and WPAR C ≥ WPAR C , thus, the denominator is non-negative and the numerator is negative. So that, COHM is decreasing monotonously. As to COUM monotonicity, we have since WPR doesn't change under the condition of adding external edges. Obviously, PRE C ≥ PRE C and PRA C ≥ PRA C , thus, both denominator and numerator are non-negative. So that, COUM is increasing monotonously. To sum up the two above cases, adding the internal edges in a package, will increase COHM and decrease COUM; while add the external edges between different packages, will decrease COHM and increase COUM. These changes are coincident with the rule of high cohesion and low coupling. Therefore, we prove both COHM and COUM satisfies the monotonicity property.

(4) Merging of Modules
Without loss of generality, assume that two packages(modules) C a , C b , where all classes in package C a have no dependencies or backwards dependencies on the classes in the package C b . The cohesions of package C a and C b are calculated as follows: Then we combine C a and C b to generate a new package C c . The cohesion of C c is list as follows: We also use N b In other words, the new cohesion of merged package is not bigger than two original cohesions.
For the coupling metric, the fourth property proposed by Braind requires COUM C a + COUM C b ≥ COUM C c . For the new merged package C c , we have It is easy to see both D c and N c are all non-negative. So, COUM C a + COUM C b ≥ COUM C c . That proves the fourth property.
To sum up, we have proved that our metrics of cohesion and coupling are satisfied with all properties proposed by Briand.

Refactoring algorithm
As mentioned above, for a software system, programmers pursue the goal of high cohesion and low coupling. Note that these two parts are not interchangeable. In software engineering, we tend to think the influence of cohesion and coupling are equally important. When we consider only one of them, we are not able to know the software system correct and clear. Therefore, combining cohesion with coupling can better reflect package modularity and fully measure software structure. Thus, we propose a refactoring algorithm based on COUM and COHM metrics to optimize software structure. Our algorithm is based on the principle of greedy algorithm to pursue high cohesion and low coupling. The detail of refactoring algorithm is summarized in Algorithm 2.
First, we move a candidate class to other packages, who have dependencies to it, for higher COHM and lower COUM. Obviously, the candidate class can only be a class that has inter-package dependencies. Moreover, for a class with less inter-package dependencies and more intra-package dependencies, moving it can disrupt the original software organization. Therefore, such classes should also be excluded from the set of candidate classes. In Algorithm 2, we adopt T 1 as the difference threshold of forwards dependencies on the target package and the source package, and T 2 is similar to T 1 , but designed for backwards dependencies. In this paper, T 1 = 2, and T 2 = 3, are designed based on experience.
Next, when a candidate class move causes the value of COHM to increase and the value of COUM to decrease, a refactoring is performed. Unluckly, cohesion and coupling do not always change cooperatively in the opposite direction Therefore, there is a trade-off when COHM and COUM both increase or decrease together. Since software is carefully designed and implemented by professional programmers, refactoring is crucial, Let P t be the currently visiting package.

7
Set D n = (as the number of dependencies of A depends on P t ).

8
Set D o = (as the number of dependencies of A depends on P s ).

9
set D nb = (the corresponding backwards dependencies on P t ). 10 set D ob = (the corresponding backwards dependencies on P s ).
Calculate COHM ogn , COUM ogn for both P s and P t .

13
Move A to P t and update C.  namely each refactoring should be of great value to the entire software system. Therefore, refactoring should occur when a good change is achieved to an extent that's not too low.
For the above reasons, if the values of COHM and COUM are changed in the same direction, we construct an evaluation model, that is: when the "good" (healthy to the software structure) changes are more than the "bad" changes at a threshold, the class is moved; otherwise the class stays without any refactoring. Here, this threshold is set at 50%. Empirically, performing a refactoring at this "good" extent is not wasteful. Finally, we repeat the above process and stopping moving until the entire software reaches the optimal configuration.
Let's analyse the complexity of Algorithm 2. Assume N be the number of classes, and N p the number of packages. There is a nested two-layers loop in Algorithm 2. The outer loop is the while-loop at the 2nd line, while the inner loop is the for-loop at the 5th line. As to the inner for-loop, only the values of COHM and COUM of source package and target package are changed in the process of refactoring, therefore, we needn't consider other packages. According to Algorithm 1, the complexity of COHM and COUM for the source package is O(N 2 ). Thus the complexity of for-loop at 5th line is O (N 2 N p ). Therefore, the total complexity of Algorithm 2 is O(N 3 N p ). Since our algorithm obeys the thought of greedy algorithm, it may encounter the problem of "local optimal". However, during the process of refactoring, the average values of cohesion and coupling for the whole software are always improved monotonously. So the correctness of Algorithm 2 is confirmed. Furthermore, the amount of classes is finite, so that the algorithm must be terminated after all classes are visited.

Refactoring experiments and analysis
Our experiments are executed on a computer with configurations of i5-3230M, 8G DDR3, Windows 10. We select ten open-source software systems for experimental verification. These software systems have different functionalities and good maturity, and have also been widely applied in the industry. The basic information statistics are collected in Table 1.
In Table 2, we show the result of refactoring ten software systems. It can easily observed that for all software systems, after refactoring, the cohesion values are improved, while the coupling values are decreased. Figure 3 shows COHM comparison before and after refactoring. We can see that after refactoring, the value of COHM of each software is   Fig. 4. Similarly, after refactoring, the value of COUM is significantly decreased, up to 72% improved. Therefore, through refactoring, the software structure is improved significantly to get closer to the goal of high cohesion and low coupling. In our past work, Mi et al. proposed an effective package-level cohesion metric, which can effectively improve software structure (Mi et al. 2019). Pan et al. also proposed a community cohesion model for refactoring (Pan et al. 2014). We compare our refactoring algorithm to theirs respectively. Note that Mi and Pan only consider the cohesion metric. However, the coupling metric similarly plays an important role on software structure. So we also compare COUM at the stop of different refactoring algorithms. Table 3 records the result of time consumption of the refactoring algorithm and the value of COUM. It is remarked that the complexity of our refactoring algorithm is equal to Mi's and less than Pan's complexity O(N 3 N 3 p ). For easy observation, we show the comparison of COUM after refactoring with three algorithms in Fig. 5. For Mi's algorithm, the value of COUM is slightly higher than ours in most cases. As to Pan's algorithm, the COUM is much higher than ours, which means Pan's algorithm may cause high coupling between packages. And worse, their refactoring algorithm consumes more time, several hours for some software systems. Therefore, our refactoring algorithm can guide the software structure correctly and execute efficiently.

Disturbing-recovering experiment and analysis
Several researches used to score the refactoring manually, which seems a little subjective. For reaching more objective comparison, we also design an experiment of disturbing and recovering to verify the correctness and efficiency of our metrics in guiding the software structure. Random disturbing for a package is that some classes in this package are randomly selected and placed into other packages. Recovering means that the disturbed classes can be recovered back to the original packages through refactoring algorithm.
Since software is an artifact developed carefully by programmers with professional skills, we deem the original structures of software systems as "PERFECT" structures. When we randomly disturb a package, the "PERFECT" structure is broken into chaos. Then we can use the refactoring algorithms to optimize the disturbed software systems. After refactoring, the more classes can be correctly recovered, the better the refactoring algorithm is. The precision rate P of recovering is calculated as

Fig. 5 The value of COUM
where N Recovered represents the number of disturbed classes recovered by the algorithm, and N disturbed denotes the total number of disturbed classes. We implement the disturbing-recovering experiment on ten Java open-source software systems. For each system, we repeatedly test 100 times and get their average. Then, we compare our refactoring algorithm with Mi's under the condition of different disturbing ratios. The detail of disturbing-recovering experiment using our refactoring algorithm under 10% disturbing ratio is shown in Table 4. It can be found from the result that our refactoring algorithm can recover the disturbed classes very well and most classes can be placed back. This explains that our metrics can optimize the software structure effectively.
We also compare the performance between our algorithm and Mi's under the condition of different disturbing ratio 6%, 10%, 14% respectively. The comparison result is shown in Table 5. We can see that under different disturbing ratios, the average of our recovery percentages are higher than Mi's in most cases, except only one software Ant.  A more intuitive visualization is also demonstrated in Fig. 6. Under different disturbing ratios, our algorithm gets a more steady performance. Therefore, the disturbingrecovering experiment shows our metrics are good for software measurement, and the refactoring algorithm can be used to improve software quality for avoiding the risk structure deviation.

Conclusion
Software is a well-designed artifact implemented by programmers with professional skills. In the long maintenance phase, software faces the risk of code quality degradation and architecture deviation caused by continuous code revision. It is urgently necessary to create metrics, methods and tools to assist programmers in a macroscopic view. In this paper, we utilize the community methods in complex network and propose two metrics of package cohesion and coupling for software measurement. These two metrics are proved to satisfy the properties proposed by Briand (Briand et al. 1996). Then, based on the new Fig. 6 The fluctuation of recovering precision 5 metrics, we construct an evaluation model for package maturity, and propose a refactoring algorithm to make software structure better. Through several experiments on multiple open-source software systems, it is shown that our metrics are capable of improving package structure to fit the rule of high cohesion and low coupling, but also recovering the disturbed classes back to the correct place.