Compression methods for graph algorithms

Graph Compression Algorithms

Document information

instructor J. P. Penny
School

University Of Canterbury

Major Computer Science
Document type Thesis
Language English
Format | PDF
Size 7.30 MB

Summary

I.Efficient Algorithms for Boolean Matrix Multiplication

This research investigates the optimization of Boolean matrix multiplication algorithms, a fundamental problem in computer science with applications in various fields. The study explores several existing methods, including the Four Russians algorithm, and introduces novel approaches to improve computational efficiency. A key focus is on reducing the time complexity, often expressed using Big O notation (e.g., O(n^2), O(n^3)), associated with these algorithms. New techniques, such as a novel summation heuristic, are proposed to accelerate the computation, achieving significant speed improvements over existing methods. The algorithms were implemented in Pascal and tested using matrices of varying sizes and densities, providing empirical evidence of the efficiency gains.

1. Boolean Matrix Representation and Compression

The research begins by establishing the equivalence of Boolean matrices and the adjacency matrices commonly used to represent graphs. A novel method is introduced for compressing Boolean matrices by representing rows or columns as single values organized in a tree structure. This compression technique forms the foundation for developing a family of algorithms designed for Boolean matrix multiplication. The core idea is to leverage tree search methods to efficiently navigate this compressed representation, offering potential computational advantages over traditional methods. This approach directly addresses the challenge of improving the efficiency of Boolean matrix multiplication, a fundamental operation in various graph algorithms.

2. Tree Search Algorithms for Boolean Matrix Multiplication

The research focuses on a family of algorithms designed for Boolean matrix multiplication that are based on tree search methods. These algorithms utilize the previously described compression technique where rows or columns are represented by single values within a tree structure. The efficiency of these algorithms is directly linked to the efficiency of tree traversal. Both depth-first search and breadth-first search strategies were implemented and compared. The number of calls to the core Tree_scan procedure, and hence the runtime, were meticulously tracked and analyzed for matrices of varying sizes and densities. The results are used to evaluate the comparative performance of depth-first versus breadth-first search within this tree-based approach to Boolean matrix multiplication.

3. Algorithm Performance Analysis and Comparisons

The performance of the developed tree-based algorithms is rigorously evaluated and compared against other established Boolean matrix multiplication methods. The study analyzes the runtime complexity, noting that it falls within the range of O(n²) to O(n³). The impact of matrix density on runtime is investigated, revealing that algorithm performance varies with density. Depth-first search consistently outperformed breadth-first search. Furthermore, algorithms using smaller maximum tree sizes required fewer calls to the Tree_scan procedure, translating to faster execution. These findings provide valuable insights into the trade-offs between different tree search strategies and compression techniques in the context of efficient Boolean matrix multiplication.

4. Heuristics for Improving Boolean Matrix Multiplication

The research explores the application of heuristics to further enhance the average-case efficiency of Boolean matrix multiplication algorithms. An 'out-of-loop' heuristic is introduced and its impact on runtime is analyzed. This heuristic significantly reduces average runtime, though the worst-case scenario might show a slight increase due to the added test. Furthermore, a novel 'summation' heuristic inspired by the tree summation algorithm is proposed. This heuristic provides an average improvement in running time and offers an O(n) improvement in the best-case scenario, demonstrating significant potential for enhancing the efficiency of various Boolean matrix multiplication algorithms. The 'summation' heuristic presents a simple yet effective method for improving performance that is applicable even to algorithms where the 'out-of-loop' heuristic cannot be used.

5. Comparison with Existing Algorithms and Implementation Considerations

The performance of the newly developed algorithms is compared against existing techniques, such as the Four Russians algorithm and its variations, including Takaoka's modification and data compression methods. The results highlight the advantages of certain algorithms for different matrix densities. The study also considers the influence of implementation details on runtime performance, acknowledging variations arising from different programming languages (Pascal was used in this research) and computer architectures. A key aspect is the contrast between the findings and previously reported results by Loukakis and Tsouris (1981), highlighting potential discrepancies due to implementation choices, such as the use of recursive versus iterative approaches in different programming languages (Fortran V used by Loukakis and Tsouris).

II.Graph Decomposition and Shortest Path Problems

The research explores the application of graph decomposition techniques to solve shortest path problems, particularly within the context of transportation networks. A new decomposition algorithm is presented with a time complexity of O(m), where 'm' represents the number of edges in the network. This algorithm efficiently compresses the network, creating a smaller representation on which shortest path computations are performed more quickly. The compressed network is then used to find the shortest paths between any two vertices in the original network using an efficient recomposition algorithm which works in linear time. The method is applied to the bus network of the Christchurch Transport Board in New Zealand, reducing a network of 951 vertices and 1966 edges to a compressed network of 101 vertices and 266 edges. A sample of 120 'centroid' bus stops was used to test shortest path computations in this application.

1. Network Decomposition for Efficient Pathfinding

This section introduces a novel approach to solving shortest path problems in networks through graph decomposition. The core idea is to decompose a complex network into smaller, more manageable subnetworks, thus simplifying the shortest path calculations. A key contribution is the development of an algorithm that achieves this decomposition in O(m) time, where m is the number of edges. The algorithm efficiently identifies and extracts essential parts of the network, resulting in a compressed representation that retains the necessary information for shortest path computations. This method is particularly advantageous for large networks, where traditional all-pairs shortest path algorithms can be computationally expensive.

2. Recomposition and Shortest Path Computation

Once the network has been decomposed and compressed, the next step involves recomposing the solution to find the shortest paths in the original network. The document describes a recomposition algorithm that effectively utilizes the results obtained from the compressed network to determine shortest paths between any pair of vertices in the original, larger network. This recomposition process is remarkably efficient, achieving linear time complexity. This two-step approach (decomposition and recomposition) offers a significant improvement in efficiency compared to directly applying shortest path algorithms to the uncompressed network, especially beneficial when dealing with large and complex networks.

3. Application to a Real World Transportation Network

The effectiveness of the proposed decomposition and recomposition technique is demonstrated through its application to a real-world transportation network: the bus network of the Christchurch Transport Board (Gabites, Porter and Partners, 1982). The original network consisted of 951 vertices (bus stops and intersections) and 1966 edges (bus routes). The decomposition algorithm successfully reduced this large network to a compressed network of 101 vertices and 266 edges. This compression significantly reduces the computational burden associated with calculating shortest paths. To evaluate the method, a representative sample of 120 'centroid' bus stops was selected, and shortest path calculations were performed on this subset of the compressed network, showcasing the practical applicability and scalability of the proposed approach.

4. Comparison with Existing Methods and Considerations for Sparse Networks

The proposed decomposition method is compared to existing techniques for network analysis. The method generalizes earlier approaches, offering a more flexible and efficient solution. The study highlights the advantages of using the proposed decomposition for sparse networks, such as planar graphs. In sparse networks, the compressed representation significantly reduces the computation time. Dijkstra's algorithm, known for its efficiency on sparse graphs, is used in conjunction with the compressed network, further emphasizing the practical benefits of the method. The discussion includes considerations for the limits on the number of edges in planar graphs and how these limits translate to the compressed network, demonstrating the theoretical underpinnings of the efficiency gains.

III.Finding Maximal Independent Sets and Cliques in Graphs

The document delves into the problem of finding all maximal independent sets (or equivalently, cliques in the complement graph) within a graph. The research analyzes the algorithm of Tsukiyama, Ide, Ariyoshi, and Shirakawa, which has a proven time bound of O(nmc) (where 'n' is the number of vertices, 'm' the number of edges, and 'c' the number of maximal independent sets). A modification to this algorithm is proposed to improve average-case performance. The modified algorithm and the original algorithm were compared with the Bron-Kerbosch algorithm and the Loukakis-Tsouris algorithm. The Moon-Moser graphs were used for testing the worst-case performance of the algorithms. Results show that the modified algorithm generally outperforms the original, although the extent of improvement varies depending on the graph structure.

1. The Tsukiyama et al. Algorithm and its Modification

This section centers on the algorithm presented by Tsukiyama, Ide, Ariyoshi, and Shirakawa (1977) for finding all maximal independent sets in a graph. This algorithm has a proven time complexity of O(nmc), where n is the number of vertices, m is the number of edges, and c is the number of maximal independent sets. The research then introduces a modification to this algorithm aiming to improve its average-case runtime performance. The core of the modification involves a subtle change in the order of vertex processing, potentially reducing the number of recursive calls needed. The effectiveness of this modification is subsequently assessed through rigorous testing and comparisons with other algorithms.

2. Algorithm Testing and Performance Evaluation

The performance of both the original and modified Tsukiyama et al. algorithms is evaluated using a set of test graphs, including the Moon-Moser graphs, which are specifically designed to test worst-case scenarios. The Moon-Moser graphs, for values of n that are multiples of 3, consist of n/3 disjoint polygons on three vertices; if n is not a multiple of 3, remaining vertices are connected but disjoint from the polygons. The results of these tests demonstrate the impact of the modification, showing that while it doesn't significantly alter the worst-case performance (as the Moon-Moser graphs are regular), it does yield improvements in average-case runtime. The analysis focuses on the number of recursive calls of different types, providing a granular understanding of the algorithm's behavior.

3. Comparison with Other Maximal Independent Set Algorithms

The performance of the original and modified Tsukiyama et al. algorithms is compared to other existing algorithms for finding maximal independent sets. The study particularly focuses on the Bron-Kerbosch algorithm, considered a practical and efficient approach, along with variations proposed by Johnston and Gerhards and Lindenberg. It also investigates the Loukakis-Tsouris algorithm, analyzing its previously reported performance claims. The comparison is based on runtime results and a more detailed order-of-magnitude analysis, aiming to resolve any inconsistencies between the results and those reported by Loukakis and Tsouris (1981). The findings provide a comprehensive evaluation of the relative strengths and weaknesses of different algorithms across various graph characteristics.

IV.Data Structures and Graph Representations

The choice of data structures significantly impacts the efficiency of graph algorithms. The study highlights the importance of efficient representations such as adjacency matrices and their relationship to Boolean algebras. It examines the trade-offs between using matrices and lists and considers how suitable data structures can be tailored to specific graph problems to create more efficient algorithms. The research emphasizes the potential benefits of using compressed representations to reduce data structure size and improve overall algorithm performance.

1. Traditional Graph Representations and Their Limitations

The discussion begins by acknowledging the common use of matrices, specifically adjacency matrices, for representing graphs. The inherent relationship between adjacency matrices and Boolean matrices is established. While these representations have been successfully used in computer algorithms, the text points out their limitations, especially when dealing with increasingly large and complex graphs. The need for more succinct graph representations is highlighted, suggesting that current methods may become inefficient for increasingly complex problems. This sets the stage for exploring more efficient data structures and techniques for graph representation.

2. Matrix and List Representations Adaptability and Algebraic Properties

The text compares and contrasts two common methods for representing graphs: matrices and lists. Both methods are easily adapted to data structures within conventional programming languages, often using arrays. The discussion highlights the algebraic properties of adjacency matrices, which can be manipulated according to well-established algebraic laws, offering advantages over more directly graph-based manipulation techniques. This understanding of the algebraic foundation of graph representation is crucial for developing and analyzing efficient algorithms. The use of matrices and lists as fundamental data structures underpins the development and implementation of the graph algorithms discussed in the study.

3. Tree Data Structures and Depth First Search

The close relationship between tree data structures and depth-first search algorithms is emphasized. Depth-first search, which derives a tree substructure from the graph, is discussed as a powerful tool for systematically searching a graph. The concept of a 'palm tree with fronds' is introduced to represent the underlying tree structure and connectivity information of the graph. The document notes that depth-first search has been successfully applied to various graph problems, including finding biconnected and triply connected components, strongly connected components, and testing graph planarity, demonstrating its versatility and efficiency in several graph-related tasks. This section demonstrates the importance of selecting appropriate data structures to optimize the performance of graph algorithms.

4. Graph Representation Using Cliques

The document explores alternative graph representations using cliques. Cliques, maximal complete subgraphs within a graph, possess unique properties that suggest their potential for graph representation. Although graph representation using cliques is discussed as an interesting idea with previous suggestions by Knodel (1971) for graph isomorphism, the inherent challenge is highlighted: finding all maximal cliques (or equivalently, maximal independent sets in the complement graph) is an NP-complete problem, implying significant computational complexity. The section acknowledges the limitations of clique-based representation for many practical applications, despite its theoretical elegance.