GraphEval2000 Dataset

By Paper Authors

Introduction

Large language models (LLMs) have achieved remarkable success in natural language processing (NLP), demonstrating significant capabilities in processing and understanding text data. However, recent studies have identified limitations in LLMs' ability to reason about graph-structured data. To address this gap, we introduce GraphEval2000, the first comprehensive graph dataset, comprising 40 graph data structure problems along with 2000 test cases. Additionally, we introduce an evaluation framework based on GraphEval2000, designed to assess the graph reasoning abilities of LLMs through coding challenges. Our dataset categorizes test cases into four primary and four sub-categories, ensuring a comprehensive evaluation. We evaluate eight popular LLMs on GraphEval2000, revealing that LLMs exhibit a better understanding of directed graphs compared to undirected ones. While private LLMs consistently outperform open-source models, the performance gap is narrowing. Furthermore, to improve the usability of our evaluation framework, we propose Structured Symbolic Decomposition (SSD), an instruction-based method designed to enhance LLM performance on GraphEval2000. Results show that SSD improves the performance of GPT-3.5, GPT-4, and GPT-4o on complex graph problems, with the increase of 11.11%, 33.37%, and 33.37%, respectively.

Dataset Update

To further demonstrate the scalability of our dataset and enhance the robustness of our results, we have expanded the dataset by increasing the number of samples in each sub-category to 100. Now it has 33100 graph samples in total. Additionally, we introduced a new category, 'Dense Graph,' characterized by a density of 0.7.

Dataset Download

You can download the dataset and python codes used in this research from the following link: Download Dataset

You can view our paper here.
You can download the supplementary materials of this dataset here.
Note that our paper is currently under review of NeurIPs 2024.

License

This dataset is licensed under a CC BY 4.0 license, see official instructions at here.