Publications

TableMind: An Autonomous Programmatic Agent for Tool-Augmented Table Reasoning

Table reasoning requires models to jointly perform comprehensive semantic understanding and precise numerical operations. Although recent large language model (LLM)-based methods have achieved promising results, most of them still rely on a single-turn reasoning paradigm that processes flattened tables in a single forward pass. This paradigm suffers from inherent limitations, including context overflow on large tables, weak sensitivity to continuous numerical values, and the absence of explicit tool-use and reflection. In this paper, we propose TableMind, a tuning-based autonomous programmatic table agent that simulates the human-like cognitive schema of multi-turn interaction within a lightweight LLM. Instead of adopting a training-free workflow design, TableMind learns to internalize planning, action, and reflection through a principled two-stage training strategy. To bootstrap structured table reasoning capabilities, we construct and filter high-quality reasoning data for the supervised fine-tuning (SFT) stage. To enable precise code generation, we introduce a designed multi-perspective reward scheme and a novel optimization objective in the reinforcement learning (RL) stage. Extensive experiments on diverse benchmarks demonstrate that TableMind consistently outperforms previous baselines, validating the effectiveness of training autonomous agents to improve overall performance.

Can Slow-thinking LLMs Reason Over Time? Empirical Studies in Time Series Forecasting

Time series forecasting (TSF) traditionally relies on fast-thinking paradigms that map historical observations directly to future sequences of continuous values. While effective, such approaches often frame forecasting as a pattern-matching problem and tend to overlook explicit reasoning over temporal dynamics and contextual factors, which are critical for modeling long-range dependencies and non-stationary behaviors in real-world scenarios. Recent slow-thinking large language models (LLMs), such as OpenAI o1 and DeepSeek-R1, demonstrate strong inference-time multi-step reasoning abilities. This raises a fundamental question: can slow-thinking LLMs reason over temporal dynamics to support accurate TSF, even without task-specific training? To investigate this question, we present TimeReasoner, a systematic empirical study that reformulates TSF as a conditional reasoning process performed entirely at inference time. TimeReasoner integrates hybrid instructions consisting of task directives, timestamps, sequential values, and optional contextual features, and induces multi-step temporal reasoning in pretrained slow-thinking LLMs through chain-of-thought prompting and rollout-based reasoning strategies. Extensive experiments across diverse TSF benchmarks show that slow-thinking LLMs consistently outperform prior baselines or achieve competitive training-free forecasting performance. Beyond accuracy, we analyze how different inference-time reasoning strategies influence forecasting behaviors, highlighting both the potential and limitations of slow-thinking paradigms for TSF.

Hierarchical multimodal llms with semantic space alignment for enhanced time series classification

Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning capacities, but directly applying them to time series classification remains non-trivial due to the representation gap between numerical sequences and linguistic semantics. In this paper, we propose HiTime, a hierarchical LLM-based framework for multimodal time series classification that bridges structured temporal representations with semantic reasoning in a generative paradigm. Specifically, we design a hierarchical sequence feature encoding module composed of a data-specific encoder and a task-specific encoder to extract complementary temporal features. To mitigate the embedding gap between time series representations and textual semantics, we further introduce a semantic space alignment module that jointly performs coarse-grained global modeling and fine-grained cross-modal correspondence. Building upon the above representations, we employ a parameter-efficient supervised fine-tuning strategy to activate the generative classification capability of the algined LLMs, thereby transforming conventional discriminative time series classification into a generative task. Extensive experiments on multiple benchmarks demonstrate that the proposed framework consistently outperforms state-of-the-art baselines.

Global Structure-aware and Feature-augmented Graph Neural Network for Heterophilic Graphs

Graph Neural Networks (GNNs) have been widely used across various fields under the homophily assumption that connected nodes are similar. However, in heterophilic graphs, where connected nodes tend to have dissimilar features, existing GNNs still face some limitations. From the perspective of structure, shallow GNNs could not capture the high-order node information, whereas deep GNNs may suffer from the over-smoothing problem. From the perspective of feature, the useful information of high-order similar nodes is often weakened by low-order dissimilar nodes in the feature update phase. To address the above problems, we propose a Global Structure-aware and Feature-augmented Graph Neural Network (GSF-GNN) to alleviate the limitations from the perspectives of structure and feature. Specifically, from the structure perspective, we design a Structure-based Global Propagation (SGP) module to establish global connections among nodes and adaptively adjust edge weights for message propagation. From the feature perspective, we introduce a Feature-augmented Compensatory Update (FCU) module, which employs a multi-view feature updating mechanism to enhance node features from different perspectives. Our theoretical analysis formally demonstrates the effectiveness of GSF-GNN in heterophilic graphs. Experiments on heterophilic and homophilic benchmark datasets validate the effectiveness of GSF-GNN across various graph structures. Moreover, GSF-GNN achieves stable performance across multiple layers and effectively alleviates the over-smoothing problem. Our codes are available on https://github.com/huijieliu2023/GSF-GNN.

From Hypothesis to Premises: LLM-based Backward Logical Reasoning with Selective Symbolic Translation

Logical reasoning is a core challenge in natural language understanding and a fundamental capability of artificial intelligence, underpinning scientific discovery, mathematical theorem proving, and complex decision-making. Despite the remarkable progress of large language models (LLMs), most current approaches still rely on forward reasoning paradigms, generating step-by-step rationales from premises to conclusions. However, such methods often suffer from redundant inference paths, hallucinated steps, and semantic drift, resulting in inefficient and unreliable reasoning. In this paper, we propose a novel framework, Hypothesis-driven Backward Logical Reasoning (HBLR). The core idea is to integrate confidence-aware symbolic translation with hypothesis-driven backward reasoning. In the translation phase, only high-confidence spans are converted into logical form, such as First-Order Logic (FOL), while uncertain content remains in natural language. A translation reflection module further ensures semantic fidelity by evaluating symbolic outputs and reverting lossy ones back to text when necessary. In the reasoning phase, HBLR simulates human deductive thinking by assuming the conclusion is true and recursively verifying its premises. A reasoning reflection module further identifies and corrects flawed inference steps, enhancing logical coherence. Extensive experiments on five reasoning benchmarks demonstrate that HBLR consistently outperforms strong baselines in both accuracy and efficiency.

STaR: Towards Cognitive Table Reasoning via Slow-Thinking Large Language Models

Table reasoning with large language models (LLMs) plays a critical role in building intelligent systems capable of understanding and analyzing tabular data. Despite recent progress, existing methods still face key limitations: their reasoning processes lacks depth and explicit multi-step reasoning, often relying solely on implicit language model understanding. In addition, their reasoning processes suffer from instability, primarily caused by model uncertainty. In this work, we propose STaR, a novel slow-thinking model that can achieve effective and stable table reasoning. To enable effective multi-step reasoning, we design a two-stage training framework consisting of supervised fine-tuning (SFT) warm-up followed by reinforced fine-tuning (RFT). Specifically, in the SFT stage, we construct a high-quality dataset through automatic self-verification. In the RFT stage, we introduce a difficulty-aware reinforcement learning mechanism to further enhance reasoning capabilities. Furthermore, to improve reasoning stability, we introduce trajectory-level uncertainty quantification, which fuses token-level confidence with answer-level consistency, enabling the selection of better reasoning trajectories. Extensive experiments demonstrate that STaR-8B achieves state-of-the-art performance on in-domain benchmarks and exhibits strong generalization to out-of-domain datasets, highlighting its potential for enhancing both effectiveness and stability in table reasoning.

Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems

Sequential recommender systems (SRS) have gained increasing popularity due to their remarkable proficiency in capturing dynamic user preferences. In the current setup of SRS, a common configuration is to uniformly consider each historical behavior as a positive interaction. However, this setting has the potential to yield sub-optimal performance as each individual item often have a different impact on shaping the user’s interests. Hence, in this paper, we propose a novel automatic sampling framework for sequential recommendation, named AutoSAM, to non-uniformly treat historical behaviors. Specifically, AutoSAM extends the conventional SRS framework by integrating an extra sampler to intelligently discern the skew distribution of the raw input, and then sample informative sub-sets to build more generalizable SRS. To tackle the challenges posed by non-differentiable sampling actions and to introduce multiple decision factors for sampling, we further design a novel reinforcement learning based method to guide the training of the sampler. Furthermore, we theoretically devise multi-objective sampling rewards including Future Prediction and Sequence Perplexity, and then optimize the whole framework in an end-to-end manner by combining the policy gradient. We conduct extensive experiments on benchmark recommendation models and four real-world datasets. The experimental results demonstrate the effectiveness of the proposed AutoSAM.

Conditional Denoising Meets Polynomial Modeling: A Flexible Decoupled Framework for Time Series Forecasting

Time series forecasting models are becoming increasingly prevalent due to their critical role in decision-making across various domains. However, most existing approaches represent the coupled temporal patterns, often neglecting the distinction between their specific components. In particular, fluctuating patterns and smooth trends within time series exhibit distinct characteristics. In this work, to model complicated temporal patterns, we propose a Conditional Denoising Polynomial Modeling (CDPM) framework, where probabilistic diffusion models and deterministic linear models are trained end-to-end. Instead of modeling the coupled time series, CDPM decomposes it into trend and seasonal components for modeling them separately. To capture the fluctuating seasonal component, we employ a probabilistic diffusion model based on statistical properties from the historical window. For the smooth trend component, a module is proposed to enhance linear models by incorporating historical dependencies, thereby preserving underlying trends and mitigating noise distortion. Extensive experiments conducted on six benchmarks demonstrate the effectiveness of our framework, highlighting the potential of combining probabilistic and deterministic models. Our code is available at https://github.com/zjt-gpu/CDPM.

TableTime: Reformulating Time Series Classification as Training-Free Table Understanding with Large Language Models

Large language models (LLMs) have shown promise in multivariate time series classification (MTSC). To effectively adapt LLMs for MTSC, it is crucial to generate comprehensive and informative data representations. Most methods utilizing LLMs encode numerical time series into the model’s latent space, aiming to align with the semantic space of LLMs for more effective learning. Despite effectiveness, we highlight three limitations that these methods overlook: (1) they struggle to incorporate temporal and channel-specific information, both of which are essential components of multivariate time series; (2) aligning the learned representation space with the semantic space of the LLMs proves to be a significant challenge; (3) they often require task-specific retraining, preventing training-free inference despite the generalization capabilities of LLMs. To bridge these gaps, we propose TableTime, which reformulates MTSC as a table understanding task. Specifically, TableTime introduces the following strategies: (1) utilizing tabular form to unify the format of time series, facilitating the transition from the model-centric approach to the data-centric approach; (2) representing time series in text format to facilitate seamless alignment with the semantic space of LLMs; (3) designing a knowledge-task dual-driven reasoning framework, TableTime, integrating contextual information and expert-level reasoning guidance to enhance LLMs’ reasoning capabilities and enable training-free classification. Extensive experiments conducted on 10 publicly available benchmark datasets from the UEA archive validate the substantial potential of TableTime to be a new paradigm for MTSC. The code is publicly available. https://github.com/realwangjiahao/TableTime.

Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study

Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of external knowledge source. However, in real-world applications, most situations involve diverse knowledge from various sources, yet this area has been less explored. The main dilemma is the lack of a suitable dataset containing multiple knowledge sources and pre-exploration of the associated issues. To address these challenges, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains. Based on this dataset, we further develop a plug-and-play RAG framework, PruningRAG, whose main characteristic is the use of multi-granularity pruning strategies to optimize the integration of relevant information while minimizing misleading context. It consistently improves performance across various existing RAG variants, demonstrating its robustness and broad applicability. Building upon the standardized dataset and PruningRAG, we also report a series of experimental results, as well as insightful findings. Our dataset and code are publicly available. https://github.com/USTCAGI/PruningRAG, with the aim of advancing future research in the RAG community.

Towards Context-aware Reasoning-enhanced Generative Searching in E-commerce

Search-based recommendation is one of the most critical application scenarios in e-commerce platforms. Users’ complex search contexts–such as spatiotemporal factors, historical interactions, and current query’s information–constitute an essential part of their decision-making, reflecting implicit preferences that complement explicit query terms. Modeling such rich contextual signals and their intricate associations with candidate items remains a key challenge. Although numerous efforts have been devoted to building more effective search methods, existing approaches still show limitations in integrating contextual information, which hinders their ability to fully capture user intent. To address these challenges, we propose a context-aware reasoning-enhanced generative search framework for better understanding the complicated context. Specifically, the framework first unifies heterogeneous user and item contexts into textual representations or text-based semantic identifiers and aligns them. To overcome the lack of explicit reasoning trajectories, we introduce a self-evolving post-training paradigm that iteratively combines supervised fine-tuning and reinforcement learning to progressively enhance the model’s reasoning capability. In addition, we identify potential biases in existing RL algorithms when applied to search scenarios and present a debiased variant of GRPO to improve ranking performance. Extensive experiments on search log data collected from a real-world e-commerce platform demonstrate that our approach achieves superior performance compared with strong baselines, validating its effectiveness for search-based recommendation.

MemWeaver: A Hierarchical Memory from Textual Interactive Behaviors for Personalized Generation

The primary form of user-internet engagement is shifting from leveraging implicit feedback signals, such as browsing and clicks, to harnessing the rich explicit feedback provided by textual interactive behaviors. This shift unlocks a rich source of user textual history, presenting a profound opportunity for a deeper form of personalization. However, prevailing approaches offer only a shallow form of personalization, as they treat user history as a flat list of texts for retrieval and fail to model the rich temporal and semantic structures reflecting dynamic nature of user interests. In this work, we propose extbf{MemWeaver}, a framework that weaves the user’s entire textual history into a hierarchical memory to power deeply personalized generation. The core innovation of our memory lies in its ability to capture both the temporal evolution of interests and the semantic relationships between different activities. To achieve this, MemWeaver builds two complementary memory components that both integrate temporal and semantic information, but at different levels of abstraction: behavioral memory, which captures specific user actions, and cognitive memory, which represents long-term preferences. This dual-component memory serves as a comprehensive representation of the user, allowing large language models (LLMs) to reason over both concrete behaviors and abstracted cognitive traits. This leads to content generation that is deeply aligned with their latent preferences. Experiments on the six datasets of the Language Model Personalization (LaMP) benchmark validate the efficacy of MemWeaver. Our code is available.

Tag-augmented Dual-target Cross-domain Recommendation

Cross-domain recommendation (CDR) has been proposed to alleviate the data sparsity issue in recommendation systems and has garnered substantial research interest. In recent years, dual-target CDR has been an increasingly prevalent research topic that emphasizes simultaneous enhancement in both the source and target domains. Many existing approaches rely on overlapping users as bridges between domains, yet in real-world scenarios, the number of such users is often severely limited, restricting their practical applicability. To overcome this limitation, alternative methods for cross-domain connections are needed, and item tags serve as a promising solution. However, real-world tags suffer from severe deficiencies in terms of both quantity and diversity, and existing studies have not fully exploited their potential. In this paper, we introduce Tag-Augmented Dual-Target Cross Domain Recommendation (TA-DTCDR), which is the first to apply LLM-distilled tag information to CDR. TA-DTCDR utilizes item tags distilled by large language models (LLMs) as an additional channel to facilitate information transfer, thereby mitigating performance decline caused by the lack of overlapping users. Furthermore, to fully leverage the natural language information carried by the distilled tags, we design a series of training tasks to align tag semantics across domains while preserving their semantic independence. The proposed method is validated on multiple tasks using public datasets, showing significant improvements over existing state-of-the-art approaches.

Preference Trajectory Modeling via Flow Matching for Sequential Recommendation

Sequential recommendation predicts each user’s next item based on their historical interaction sequence. Recently, diffusion models have attracted significant attention in this area due to their strong ability to model user interest distributions. They typically generate target items by denoising Gaussian noise conditioned on historical interactions. However, these models face two critical limitations. First, they exhibit high sensitivity to the condition, making it difficult to recover target items from pure Gaussian noise. Second, the inference process is computationally expensive, limiting practical deployment. To address these issues, we propose FlowRec, a simple yet effective sequential recommendation framework which leverages flow matching to explicitly model user preference trajectories from current states to future interests. Flow matching is an emerging generative paradigm, which offers greater flexibility in initial distributions and enables more efficient sampling. Based on this, we construct a personalized behavior-based prior distribution to replace Gaussian noise and learn a vector field to model user preference trajectories. To better align flow matching with the recommendation objective, we further design a single-step alignment loss incorporating both positive and negative samples, improving sampling efficiency and generation quality. Extensive experiments on four benchmark datasets verify the superiority of FlowRec over the state-of-the-art baselines.

Benchmarking Multimodal LLMs on Recognition and Understanding over Chemical Tables

With the widespread application of multimodal large language models in scientific intelligence, there is an urgent need for more challenging evaluation benchmarks to assess their ability to understand complex scientific data. Scientific tables, as core carriers of knowledge representation, combine text, symbols, and graphics, forming a typical multimodal reasoning scenario. However, existing benchmarks are mostly focused on general domains, failing to reflect the unique structural complexity and domain-specific semantics inherent in scientific research. Chemical tables are particularly representative: they intertwine structured variables such as reagents, conditions, and yields with visual symbols like molecular structures and chemical formulas, posing significant challenges to models in cross-modal alignment and semantic parsing. To address this, we propose ChemTable-a large scale benchmark of chemical tables constructed from real-world literature, containing expert-annotated cell layouts, logical structures, and domain-specific labels. It supports two core tasks: (1) table recognition (structure and content extraction); and (2) table understanding (descriptive and reasoning-based question answering). Evaluation on ChemTable shows that while mainstream multimodal models perform reasonably well in layout parsing, they still face significant limitations when handling critical elements such as molecular structures and symbolic conventions. Closed-source models lead overall but still fall short of human-level performance. This work provides a realistic testing platform for evaluating scientific multimodal understanding, revealing the current bottlenecks in domain-specific reasoning and advancing the development of intelligent systems for scientific research.

Are LLMs Stable Formal Logic Translators in Logical Reasoning Across Linguistically Diversified Texts?

Logical reasoning with large language models (LLMs) has received growing attention. One mainstream approach translates natural language into formal logic and then applies symbolic solvers for deduction. While effective in many tasks, these LLM-based translators often fail to generate consistent symbolic representations when the same concept appears in different linguistic forms. Such inconsistencies break logical coherence and lead to solver errors. However, most existing benchmarks lack this type of linguistic variation, which frequently occurs in real-world text, leaving the problem underexplored. To address this gap, we present SoLT, a benchmark that systematically rewrites reasoning datasets into diverse yet logically equivalent forms across multiple levels. Beyond evaluation, SoLT also provides a general method to enrich any dataset with linguistic diversity while preserving both meaning and logic. To further enhance the stability of LLM-based reasoning, we propose MenTaL, which explicitly guides models to build a concept-symbol mapping table during translation. By linking equivalent expressions to shared symbols, MenTaL maintains consistency and mitigates symbol drift. Experiments on SoLT demonstrate that LLMs indeed suffer from inconsistent symbol mapping under linguistic variation, leading to significant drops in reasoning accuracy. Meanwhile, applying MenTaL brings clear and stable performance improvements across diverse inputs. Overall, our findings reveal that overlooking linguistic diversity hides key weaknesses in LLM-based translators, and our work offers a step toward more reliable logical reasoning in varied real-world scenarios. Our code is available at https://github.com/qingchuanli/LinguDiver.

ConvTimeNet: A Deep Hierarchical Fully Convolutional Model for Multivariate Time Series Analysis

Designing effective models for learning time series representations is foundational for time series analysis. Many previous works explore time series representation modeling approaches and make progress in this area. Despite their effectiveness, they lack adaptive perception of local patterns in temporally dependent basic units and fail to capture the multi-scale dependency among these units. Instead of relying on prevalent methods centered around self-attention mechanisms, we propose ConvTimeNet, a hierarchical pure convolutional model designed for time series analysis. ConvTimeNet introduces a deformable patch layer that adaptively perceives local patterns of temporally dependent basic units in a data-driven manner. Based on the extracted local patterns, hierarchical pure convolutional blocks are designed to capture dependency relationships among the representations of basic units at different scales. Moreover, a large kernel mechanism is employed to ensure that convolutional blocks can be deeply stacked, thereby achieving a larger receptive field. In this way, local patterns and their multi-scale dependencies can be effectively modeled within a single model. Extensive experiments comparing a wide range of different types of models demonstrate that pure convolutional models still exhibit strong viability, effectively addressing the aforementioned two challenges and showing superior performance across multiple tasks. The code is available for reproducibility. https://github.com/Mingyue-Cheng/ConvTimeNet

A Comprehensive Survey of Time Series Forecasting: Concepts, Challenges, and Future Directions

Time series forecasting (TSF) has become an increasingly vital tool in various decision-making applications, including business intelligence and scientific discovery, in today’s rapidly evolving digital landscape. Over the years, a wide range of methods for TSF has been proposed, spanning from traditional static-based models to more recent machine learning-driven, data-intensive approaches. Despite the extensive body of research, there is still no universally accepted, unified problem statement or systematic elaboration of the core challenges and characteristics of TSF. The extent to which deep TSF models can address fundamental issues-such as data sparsity and non-stationarity-remains unclear, and the broader TSF research landscape continues to evolve, shaped by diverse methodological trends. This comprehensive survey aims to address these gaps by examining the key entities in TSF (e.g., covariates) and their characteristics (e.g., frequency, length, missing values). We introduce a general problem formulation and challenge analysis for TSF, propose a taxonomy that classifies representative methodologies from the preprocessing and forecasting perspectives, and highlight emerging topics like transfer learning and trustworthy forecasting. Finally, we discuss promising research directions that are poised to drive innovation in this dynamic and rapidly advancing field. The related paper list is available at https://github.com/USTCAGI/Awesome-Papers-Time-Series-Forecasting.

A Survey on Table Mining with Large Language Models: Challenges, Advancements and Prospects

Table mining is a popular research field that involves complicated technologies, including information retrieval, data mining, visual and textual understanding and logical reasoning. With the emergence of Large Language Models (LLMs), the field has witnessed considerable advancements, presenting new paradigms for table understanding, extraction, and reasoning. In this survey, we conduct a comprehensive review of the literature on table mining with LLMs. We begin by introducing the fundamental overview of tabular data and possible challenges in LLM-based table mining. Specifically, we explore the challenges unique to this domain, such as heterogeneous table structures, contextual ambiguity, and domain-specific knowledge requirements. Then, we summarize representative tabular tasks in table preparation and mining, categorizing existing methods along dimensions including task scope, model architecture, and application scenarios. Next, we describe advanced LLM-based learning strategies in table mining, including foundation models and training-free methods. As specific issues, we review studies of trustworthy LLM-based table mining and some domain-specific applications. Finally, we discuss prospects and future directions in the field of LLM-based table reasoning, including issues of generalization, interpretability, efficiency, etc. We hope this survey can present valuable resources for researchers and practitioners, paving the way for further exploration in this field. The repository is at: https://github.com/USTCAGI/Awesome-LLM-Table-Mining.

A Survey on Knowledge-Oriented Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) has gained significant attention in recent years for its potential to enhance natural language understanding and generation by combining large-scale retrieval systems with generative models. RAG leverages external knowledge sources, such as documents, databases, or structured data, to improve model performance and generate more accurate and contextually relevant outputs. This survey aims to provide a comprehensive overview of RAG by examining its fundamental components, including retrieval mechanisms, generation processes, and the integration between the two. We discuss the key characteristics of RAG, such as its ability to augment generative models with dynamic external knowledge, and the challenges associated with aligning retrieved information with generative objectives. We also present a taxonomy that categorizes RAG methods, ranging from basic retrieval-augmented approaches to more advanced models incorporating multi-modal data and reasoning capabilities. Additionally, we review the evaluation benchmarks and datasets commonly used to assess RAG systems, along with a detailed exploration of its applications in fields such as question answering, summarization, and information retrieval. Finally, we highlight emerging research directions and opportunities for improving RAG systems, such as enhanced retrieval efficiency, model interpretability, and domain-specific adaptations. This paper concludes by outlining the prospects for RAG in addressing real-world challenges and its potential to drive further advancements in natural language processing.

InstrucTime: Advancing Time Series Classification with Multimodal Language Modeling

For the advancement of time series classification, we can summarize that most existing methods adopt a common learning-to-classify paradigm - a classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) one-hot distribution fails to reflect the comparability and similarity between labels, and (2) it is difficult to learn transferable representation across domains. In this work, we propose InstructTime, a novel attempt to reshape time series classification as a learning-to-generate paradigm. Relying on the generative capacity of the pre-trained language model, the core idea is to formulate the classification of time series as a multimodal understanding task. Specifically, firstly, a time series discretization module is designed to convert continuous inputs into a sequence of discrete tokens to solve the inconsistency issue across modality data. Secondly, we introduce an alignment projected layer before feeding the transformed token of time series into language models. Thirdly, prior to fine-tuning the language model for the target domain, it is essential to emphasize the necessity of auto-regressive pre-training across various modality inputs. Finally, extensive experimentation are conducted on several prevalent public benchmark datasets, indicating the superior performance of the InstructTime. Our code is at https://github.com/Mingyue-Cheng/InstructTime.

Cross-Domain Pre-training with Language Models for Transferable Time Series Representations

Pre-training universal models across multiple domains to enhance downstream tasks is a prevalent learning paradigm. However, there has been minimal progress in pre-training transferable models across domains for time series representation. This dilemma is incurred by two key factors: the limited availability of training set within each domain and the substantial differences in data characteristics between domains. To address these challenges, we present a novel framework, namely CrossTimeNet, designed to perform cross-domain self-supervised pre-training to benefit target tasks. Specifically, to address the issue of data scarcity, we utilize a pre-trained language model as the backbone network to effectively capture the sequence dependencies of the input time series. Meanwhile, we adopt the recovery of corrupted region inputs as a self-supervised optimization objective, taking into account the locality of the time series. To address discrepancies in data characteristics, we introduce a novel tokenization module that converts continuous time series inputs into discrete token sequences using vector quantization techniques. This approach facilitates the learning of transferable time series models across different domains. Extensive experimental results on diverse time series tasks, including classification and forecasting, demonstrate the effectiveness of our approach. Our codes are publicly available at https://github.com/Mingyue-Cheng/CrossTimeNet.

Recognizing unseen objects via multimodal intensive knowledge graph propagation

Zero-Shot Learning (ZSL), which aims at automatically recognizing unseen objects, is a promising learning paradigm to understand new real-world knowledge for machines continuously. Recently, the Knowledge Graph (KG) has been proven as an effective scheme for handling the zero-shot task with large-scale and non-attribute data. Prior studies always embed relationships of seen and unseen objects into visual information from existing knowledge graphs to promote the cognitive ability of the unseen data. Actually, real-world knowledge is naturally formed by multimodal facts. Compared with ordinary structural knowledge from a graph perspective, multimodal KG can provide cognitive systems with fine-grained knowledge. For example, the text description and visual content can depict more critical details of a fact than only depending on knowledge triplets. Unfortunately, this multimodal fine-grained knowledge is largely unexploited due to the bottleneck of feature alignment between different modalities. To that end, we propose a multimodal intensive ZSL framework that matches regions of images with corresponding semantic embeddings via a designed dense attention module and self-calibration loss. It makes the semantic transfer process of our ZSL framework learns more differentiated knowledge between entities. Our model also gets rid of the performance limitation of only using rough global features. We conduct extensive experiments and evaluate our model on large-scale real-world data. The experimental results clearly demonstrate the effectiveness of the proposed model in standard zero-shot classification tasks.

Towards Personalized Evaluation of Large Language Models with An Anonymous Crowd-Sourcing Platform

Large language model evaluation plays a pivotal role in the enhancement of its capacity. Previously, numerous methods for evaluating large language models have been proposed in this area. Despite their effectiveness, these existing works mainly focus on assessing objective questions, overlooking the capability to evaluate subjective questions which is extremely common for large language models. Additionally, these methods predominantly utilize centralized datasets for evaluation, with question banks concentrated within the evaluation platforms themselves. Moreover, the evaluation processes employed by these platforms often overlook personalized factors, neglecting to consider the individual characteristics of both the evaluators and the models being evaluated. To address these limitations, we propose a novel anonymous crowd-sourcing evaluation platform, BingJian, for large language models that employs a competitive scoring mechanism where users participate in ranking models based on their performance. This platform stands out not only for its support of centralized evaluations to assess the general capabilities of models but also for offering an open evaluation gateway. Through this gateway, users have the opportunity to submit their questions, testing the models on a personalized and potentially broader range of capabilities. Furthermore, our platform introduces personalized evaluation scenarios, leveraging various forms of human-computer interaction to assess large language models in a manner that accounts for individual user preferences and contexts. The demonstration of BingJian can be accessed at https://github.com/Mingyue-Cheng/Bingjian.

Exploring large language model for graph data understanding in online job recommendations

Large Language Models (LLMs) have revolutionized natural language processing tasks, demonstrating their exceptional capabilities in various domains. However, their potential for graph semantic mining in job recommendations remains largely unexplored. This paper focuses on unveiling the capability of large language models in understanding behavior graphs and leveraging this understanding to enhance recommendations in online recruitment, including promoting out-of-distribution (OOD) applications. We present a novel framework that harnesses the rich contextual information and semantic representations provided by large language models to analyze behavior graphs and uncover underlying patterns and relationships. Specifically, we propose a meta-path prompt constructor that aids LLM recommender in grasping the semantics of behavior graphs for the first time and design a corresponding path augmentation module to alleviate the prompt bias introduced by path-based sequence input. By facilitating this capability, our framework enables personalized and accurate job recommendations for individual users. We evaluate the effectiveness of our approach on comprehensive real-world datasets and demonstrate its ability to improve the relevance and quality of recommended results. This research not only sheds light on the untapped potential of large language models but also provides valuable insights for developing advanced recommendation systems in the recruitment market. The findings contribute to the growing field of natural language processing and offer practical implications for enhancing job search experiences.

Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main challenge lies in the significant differences in the characteristics of time-series data across different domains, such as variations in the number of channels and temporal resolution scales. To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task. One of the key characteristics of CrossTimeNet is the newly designed time series tokenization module, which could effectively convert the raw time series into a sequence of discrete tokens based on a reconstruction optimization process. Besides, we highlight that predicting a high proportion of corrupted tokens can be very helpful for extracting informative patterns across different domains during SSL pre-training, which has been largely overlooked in past years. Furthermore, unlike previous works, our work treats the pre-training language model (PLM) as the initialization of the encoder network, investigating the feasibility of transferring the knowledge learned by the PLM to the time series area. Through these efforts, the path to cross-domain pre-training of a generic time series model can be effectively paved. We conduct extensive experiments in a real-world scenario across various time series classification domains. The experimental results clearly confirm CrossTimeNet’s superior performance.

Advancing Time Series Classification with Multimodal Language Modeling

For the advancements of time series classification, scrutinizing previous studies, most existing methods adopt a common learning-to-classify paradigm - a time series classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) encoding target categories with one-hot distribution fails to reflect the comparability and similarity between labels, and (2) it is very difficult to learn transferable model across domains, which greatly hinder the development of universal serving paradigm. In this work, we propose InstructTime, a novel attempt to reshape time series classification as a learning-to-generate paradigm. Relying on the powerful generative capacity of the pre-trained language model, the core idea is to formulate the classification of time series as a multimodal understanding task, in which both task-specific instructions and raw time series are treated as multimodal inputs while the label information is represented by texts. To accomplish this goal, three distinct designs are developed in the InstructTime. Firstly, a time series discretization module is designed to convert continuous time series into a sequence of hard tokens to solve the inconsistency issue across modal inputs. To solve the modality representation gap issue, for one thing, we introduce an alignment projected layer before feeding the transformed token of time series into language models. For another, we highlight the necessity of auto-regressive pre-training across domains, which can facilitate the transferability of the language model and boost the generalization performance. Extensive experiments are conducted over benchmark datasets, whose results uncover the superior performance of InstructTime and the potential for a universal foundation model in time series classification.

Generative Pretrained Hierarchical Transformer for Time Series Forecasting

Recent efforts have been dedicated to enhancing time series forecasting accuracy by introducing advanced network architectures and self-supervised pretraining strategies. Nevertheless, existing approaches still exhibit two critical drawbacks. Firstly, these methods often rely on a single dataset for training, limiting the model’s generalizability due to the restricted scale of the training data. Secondly, the one-step generation schema is widely followed, which necessitates a customized forecasting head and overlooks the temporal dependencies in the output series, and also leads to increased training costs under different horizon length settings. To address these issues, we propose a novel generative pretrained hierarchical transformer architecture for forecasting, named GPHT. There are two aspects of key designs in GPHT. On the one hand, we advocate for constructing a mixed dataset for pretraining our model, comprising various datasets from diverse data scenarios. This approach significantly expands the scale of training data, allowing our model to uncover commonalities in time series data and facilitating improved transfer to specific datasets. On the other hand, GPHT employs an auto-regressive forecasting approach under the channel-independent assumption, effectively modeling temporal dependencies in the output series. Importantly, no customized forecasting head is required, enabling a single model to forecast at arbitrary horizon settings. We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models. The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task, providing support for verifying the feasibility of pretrained time series large models.

Supporting Your Idea Reasonably: A Knowledge-Aware Topic Reasoning Strategy for Citation Recommendation

With the explosive growth of scholarly information, researchers spend much time and effort copiously quoting authoritative works to support their ideas or motivations. We aim to alleviate this situation by proposing a citation recommendation strategy that recalls related papers for a rough idea (a piece of text, i.e., abstract, manuscript). However, the perspective of existing citation recommendations can not be well applied to our task for two defects. First, these methods neglect the reasoning of research topics, which makes the recommendation mechanism not meticulous enough and lacks explainability. For instance, they are not able to mine the hidden citing logic for the candidate paper while recommending. We fill the research gap by constructing structural topics consisting of knowledge concepts from the textual content, where reasoning paths between topics are extracted from an external knowledge graph. Second, the citation network is viewed as a crucial structural context to enhance the recommendation performance, but the new target idea does not have links to the citation network as published papers do. To simulate the prospective topological structure, our model, meanwhile, incorporates a contrastive-learning-based alignment paradigm to encourage the consistency of content embeddings and structure-oriented embeddings. We evaluate our proposed model on three real-world datasets and demonstrate that it significantly improves recommendation accuracy while providing high-quality knowledge-aware reasoning. And an interesting visual example illustrates the reasoning process when our model actually judges samples, which supports the feasibility of our topic-view learning paradigm.

AdaptSSR: Pre-training User Model with Augmentation-Adaptive Self-Supervised Ranking

Federated recommendation (FedRec) can train personalized recommenders without collecting user data, but the decentralized nature makes it susceptible to poisoning attacks. Most previous studies focus on the targeted attack to promote certain items, while the untargeted attack that aims to degrade the overall performance of the FedRec system remains less explored. In fact, untargeted attacks can disrupt the user experience and bring severe financial loss to the service provider. However, existing untargeted attack methods are either inapplicable or ineffective against FedRec systems. In this paper, we delve into the untargeted attack and its defense for FedRec systems. (i) We propose ClusterAttack, a novel untargeted attack method. It uploads poisonous gradients that converge the item embeddings into several dense clusters, which make the recommender generate similar scores for these items in the same cluster and perturb the ranking order. (ii) We propose a uniformity-based defense mechanism (UNION) to protect FedRec systems from such attacks. We design a contrastive learning task that regularizes the item embeddings toward a uniform distribution. Then the server filters out these malicious gradients by estimating the uniformity of updated item embeddings. Experiments on two public datasets show that ClusterAttack can effectively degrade the performance of FedRec systems while circumventing many defense methods, and UNION can improve the resistance of the system against various untargeted attacks, including our ClusterAttack.

Adaptive Normalization for Non-stationary Time Series Forecasting: A Temporal Slice Perspective

Deep learning models have progressively advanced time series forecasting due to their powerful capacity in capturing sequence dependence. Nevertheless, it is still challenging to make accurate predictions due to the existence of non-stationarity in real-world data, denoting the data distribution rapidly changes over time. To mitigate such a dilemma, several efforts have been conducted by reducing the non-stationarity with normalization operation. However, these methods typically overlook the distribution discrepancy between the input series and the horizon series, and assume that all time points within the same instance share the same statistical properties, which is too ideal and may lead to suboptimal relative improvements. To this end, we propose a novel slice-level adaptive normalization, referred to \textbf{SAN}, which is a novel scheme for empowering time series forecasting with more flexible normalization and denormalization. SAN includes two crucial designs. First, SAN tries to eliminate the non-stationarity of time series in units of a local temporal slice (i.e., sub-series) rather than a global instance. Second, SAN employs a slight network module to independently model the evolving trends of statistical properties of raw time series. Consequently, SAN could serve as a general model-agnostic plugin and better alleviate the impact of the non-stationary nature of time series data. We instantiate the proposed SAN on four widely used forecasting models and test their prediction results on benchmark datasets to evaluate its effectiveness. Also, we report some insightful findings to deeply analyze and understand our proposed SAN. We make our codes publicly available.

A general tail item representation enhancement framework for sequential recommendation

Recently advancements in deep learning models have significantly facilitated the development of sequential recommender systems (SRS). However, the current deep model structures are limited in their ability to learn high-quality embeddings with insufficient data. Meanwhile, highly skewed long-tail distribution is very common in recommender systems. Therefore, in this paper, we focus on enhancing the representation of tail items to improve sequential recommendation performance. Through empirical studies on benchmarks, we surprisingly observe that both the ranking performance and training procedure are greatly hindered by the poorly optimized tail item embeddings. To address this issue, we propose a sequential recommendation framework named TailRec that enables contextual information of tail item well-leveraged and greatly improves its corresponding representation. Given the characteristics of the sequential recommendation task, the surrounding interaction records of each tail item are regarded as contextual information without leveraging any additional side information. This approach allows for the mining of contextual information from cross-sequence behaviors to boost the performance of sequential recommendations. Such a light contextual filtering component is plug-and-play for a series of SRS models. To verify the effectiveness of the proposed TailRec, we conduct extensive experiments over several popular benchmark recommenders. The experimental results demonstrate that TailRec can greatly improve the recommendation results and speed up the training process. The codes of our methods have been available.

Unlocking the Potential of Large Language Models for Explainable Recommendations

Generating user-friendly explanations regarding why an item is recommended has become increasingly common, largely due to advances in language generation technology, which can enhance user trust and facilitate more informed decision-making when using online services. However, existing explainable recommendation systems focus on using small-size language models. It remains uncertain what impact replacing the explanation generator with the recently emerging large language models (LLMs) would have. Can we expect unprecedented results? In this study, we propose LLMXRec, a simple yet effective two-stage explainable recommendation framework aimed at further boosting the explanation quality by employing LLMs. Unlike most existing LLM-based recommendation works, a key characteristic of LLMXRec is its emphasis on the close collaboration between previous recommender models and LLM-based explanation generators. Specifically, by adopting several key fine-tuning techniques, including parameter-efficient instructing tuning and personalized prompt techniques, controllable and fluent explanations can be well generated to achieve the goal of explanation recommendation. Most notably, we provide three different perspectives to evaluate the effectiveness of the explanations. Finally, we conduct extensive experiments over several benchmark recommender models and publicly available datasets. The experimental results not only yield positive results in terms of effectiveness and efficiency but also uncover some previously unknown outcomes. To facilitate further explorations in this area, the full code and detailed original results are open-sourced at https://github.com/GodFire66666/LLM_rec_explanation/.

Towards Automatic Sampling of User Behaviors for Sequential Recommender Systems

Sequential recommender systems (SRS) have gained widespread popularity in recommendation due to their ability to effectively capture dynamic user preferences. One default setting in the current SRS is to uniformly consider each historical behavior as a positive interaction. Actually, this setting has the potential to yield sub-optimal performance, as each item makes a distinct contribution to the user’s interest. For example, purchased items should be given more importance than clicked ones. Hence, we propose a general automatic sampling framework, named AutoSAM, to non-uniformly treat historical behaviors. Specifically, AutoSAM augments the standard sequential recommendation architecture with an additional sampler layer to adaptively learn the skew distribution of the raw input, and then sample informative sub-sets to build more generalizable SRS. To overcome the challenges of non-differentiable sampling actions and also introduce multiple decision factors for sampling, we further introduce a novel reinforcement learning based method to guide the training of the sampler. We theoretically design multi-objective sampling rewards including Future Prediction and Sequence Perplexity, and then optimize the whole framework in an end-to-end manner by combining the policy gradient. We conduct extensive experiments on benchmark recommender models and four real-world datasets. The experimental results demonstrate the effectiveness of the proposed approach. We will make our code publicly available after the acceptance.

Federated News Recommendation with Fine-grained Interpolation and Dynamic Clustering

Researchers have successfully adapted the privacy-preserving Federated Learning (FL) to news recommendation tasks to better protect users’ privacy, although typically at the cost of performance degradation due to the data heterogeneity issue. To address this issue, Personalized Federated Learning (PFL) has emerged, among which model interpolation is a promising approach that interpolates the local personalized models with the global model. However, the existing model interpolation method may not work well for news recommendation tasks for some reasons. First, it neglects the fine-grained personalization needs at both the temporal and spatial levels in news recommendation tasks. Second, due to the cold-user problem in real-world news recommendation tasks, the local personalized models may perform poorly, thus limiting the performance gain from model interpolation. To this end, we propose FINDING (Federated News Recommendation with Fine-grained Interpolation and Dynamic Clustering ), a novel personalized federated learning framework based on model interpolation. Specifically, we first propose the fine-grained model interpolation strategy which interpolates the local personalized models with the global model in a time-aware and layer-aware way. Then, to address the cold-user problem in news recommendation tasks, we adopt the group-level personalization approach where users are dynamically clustered into groups and the group-level personalized models are used for interpolation. Extensive experiments on two real-world datasets show that our method can effectively handle the above limitations of the current model interpolation method and alleviate the heterogeneity issue faced by traditional FL.

Communication-efficient federated learning with stagewise training strategy

The efficiency of communication across workers is a significant factor that affects the performance of federated learning. Though periodic communication strategy is applied to reduce communication rounds in training, the communication cost is still high when the training data distributions are not independently and identically distributed (non-IID) which is common in federated learning. Recently, some works introduce variance reduction to eliminate the effect caused by non-IID data among workers. Nevertheless the provable optimal communication complexity O ( log ( S T ) ) and convergence rate O ( 1 / ( S T ) ) cannot be achieved simultaneously, where S denotes the number of sampled workers in each round and T is the number of iterations. To deal with this dilemma, we propose an optimization algorithm SQUARFA that adopts stagewise training framework coupling with variance reduction and uses a quick-start phase in each loop. Theoretical results show that SQUARFA achieves both optimal convergence rate and communication complexity for both strongly convex objectives and non-convex objectives under PL condition, thus fills the gap mentioned above. Then, a variant of SQUARFA yields the optimal theoretical results for general non-convex objectives. We further extend the technique in SQUARFA to the large batch setting and achieve optimal communication complexity. Experimental results demonstrate the superiority of the proposed algorithms.

KMF: knowledge-aware multi-faceted representation learning for zero-shot node classification

Recently, Zero-Shot Node Classification (ZNC) has been an emerging and crucial task in graph data analysis. This task aims to predict nodes from unseen classes which are unobserved in the training process. Existing work mainly utilizes Graph Neural Networks (GNNs) to associate features’ prototypes and labels’ semantics thus enabling knowledge transfer from seen to unseen classes. However, the multi-faceted semantic orientation in the feature-semantic alignment has been neglected by previous work, i.e. the content of a node usually covers diverse topics that are relevant to the semantics of multiple labels. It’s necessary to separate and judge the semantic factors that tremendously affect the cognitive ability to improve the generality of models. To this end, we propose a Knowledge-Aware Multi-Faceted framework (KMF) that enhances the richness of label semantics via the extracted KG (Knowledge Graph)-based topics. And then the content of each node is reconstructed to a topic-level representation that offers multi-faceted and fine-grained semantic relevancy to different labels. Due to the particularity of the graph’s instance (i.e., node) representation, a novel geometric constraint is developed to alleviate the problem of prototype drift caused by node information aggregation. Finally, we conduct extensive experiments on several public graph datasets and design an application of zero-shot cross-domain recommendation. The quantitative results demonstrate both the effectiveness and generalization of KMF with the comparison of state-of-the-art baselines.

Using Entropy for Group Sampling in Pairwise Ranking from implicit feedback

In recent years, pairwise methods, such as Bayesian Personalized Ranking (BPR), have gained significant attention in the field of collaborative filtering for recommendation systems. Group BPR is an extension of BPR that incorporates user groups to relax the strict assumption of independence between two users. However, the reliability of its user groups may be compromised as they only focus on a few behavioral similarities. To address this problem, this paper proposes a new entropy-weighted similarity measure for implicit feedback to quantify the relation between two users and sample like-minded user groups. We first introduce the group preference into several pairwise ranking algorithms and then utilize the entropy-weighted similarity to sample groups to further improve these algorithms. Unlike other approaches that rely solely on common item ratings, our method incorporates global information into the similarity measure, resulting in a more reliable approach to group sampling. We conducted experiments on two real-world datasets and evaluated our method using different metrics. The results show that our method can construct better user groups from sparse data and produce more accurate recommendations. Our approach can be applied to a wide range of recommendation systems, and this can significantly improve the performance of pairwise ranking algorithms, making it an effective tool for pairwise ranking.

Untargeted Attack against Federated Recommendation Systems via Poisonous Item Embeddings and the Defense

Federated recommendation (FedRec) can train personalized recommenders without collecting user data, but the decentralized nature makes it susceptible to poisoning attacks. Most previous studies focus on the targeted attack to promote certain items, while the untargeted attack that aims to degrade the overall performance of the FedRec system remains less explored. In fact, untargeted attacks can disrupt the user experience and bring severe financial loss to the service provider. However, existing untargeted attack methods are either inapplicable or ineffective against FedRec systems. In this paper, we delve into the untargeted attack and its defense for FedRec systems. (i) We propose ClusterAttack, a novel untargeted attack method. It uploads poisonous gradients that converge the item embeddings into several dense clusters, which make the recommender generate similar scores for these items in the same cluster and perturb the ranking order. (ii) We propose a uniformity-based defense mechanism (UNION) to protect FedRec systems from such attacks. We design a contrastive learning task that regularizes the item embeddings toward a uniform distribution. Then the server filters out these malicious gradients by estimating the uniformity of updated item embeddings. Experiments on two public datasets show that ClusterAttack can effectively degrade the performance of FedRec systems while circumventing many defense methods, and UNION can improve the resistance of the system against various untargeted attacks, including our ClusterAttack.

A survey on large language models for recommendation

Large Language Models (LLMs) have emerged as powerful tools in the field of Natural Language Processing (NLP) and have recently gained significant attention in the domain of Recommendation Systems (RS). These models, trained on massive amounts of data using self-supervised learning, have demonstrated remarkable success in learning universal representations and have the potential to enhance various aspects of recommendation systems by some effective transfer techniques such as fine-tuning and prompt tuning, and so on. The crucial aspect of harnessing the power of language models in enhancing recommendation quality is the utilization of their high-quality representations of textual features and their extensive coverage of external knowledge to establish correlations between items and users. To provide a comprehensive understanding of the existing LLM-based recommendation systems, this survey presents a taxonomy that categorizes these models into two major paradigms, respectively Discriminative LLM for Recommendation (DLLM4Rec) and Generative LLM for Recommendation (GLLM4Rec), with the latter being systematically sorted out for the first time. Furthermore, we systematically review and analyze existing LLM-based recommendation systems within each paradigm, providing insights into their methodologies, techniques, and performance. Additionally, we identify key challenges and several valuable findings to provide researchers and practitioners with inspiration. We have also created a GitHub repository to index relevant papers on LLMs for recommendation.

Learning the explainable semantic relations via unified graph topic-disentangled neural networks

Graph Neural Networks (GNNs) such as Graph Convolutional Networks (GCNs) can effectively learn node representations via aggregating neighbors based on the relation graph. However, despite a few exceptions, most of the previous work in this line does not consider the topical semantics underlying the edges, making the node representations less effective and the learned relation between nodes hard to explain. For instance, the current GNNs make us usually don’t know what is the reason for the connection of network nodes, such as the specific research topics cited in this article and the concerns among friends on social platforms. Some methods have begun to explore the extraction of relation semantics in recent related literature, but existing studies generally face two bottlenecks, i.e., either being unable to explain the mined latent relations to ensure their reasonableness and independence, or demanding the textual content of edges which is unavailable in most real-world datasets. Actually, these two issues are both crucial in practical use. In our work, we propose a novel Topic-Disentangled Graph Neural Network (TDG) to address the above two issues at the same time, which explores the relation topics from the perspective of node contents. We design an optimized graph topic module to handle node features to construct independent and explainable semantic subspaces, then the reasonable relation topics that correspond to these subspaces are assigned to each graph relation via a neighborhood routing mechanism. Our proposed model can be easily combined with related graph tasks to form an end-to-end model, to avoid the risk of deviation between node representation space and task space. To evaluate the efficiency of our model, sufficient node-related tasks are conducted on three public datasets in the experimental section. The results show the obvious superiority of TDG compared with the state-of-the-art models.

FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification

Deep learning-based algorithms, e.g., convolutional networks, have significantly facilitated multivariate time series classification (MTSC) task. Nevertheless, they suffer from the limitation in modeling long-range dependence due to the nature of convolution operations. Recent advancements have shown the potential of transformers to capture long-range dependence. However, it would incur severe issues, such as fixed scale representations, temporal-invariant and quadratic time complexity, with transformers directly applicable to the MTSC task because of the distinct properties of time series data. To tackle these issues, we propose FormerTime, an hierarchical representation model for improving the classification capacity for the MTSC task. In the proposed FormerTime, we employ a hierarchical network architecture to perform multi-scale feature maps. Besides, a novel transformer encoder is further designed, in which an efficient temporal reduction attention layer and a well-informed contextual positional encoding generating strategy are developed. To sum up, FormerTime exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism. Extensive experiments performed on 10 publicly available datasets from UEA archive verify the superiorities of the FormerTime compared to previous competitive baselines.

TimeMAE: Self-Supervised Representations of Time Series with Decoupled Masked Autoencoders

Enhancing the expressive capacity of deep learning-based time series models with self-supervised pre-training has become ever-increasingly prevalent in time series classification. Even though numerous efforts have been devoted to developing self-supervised models for time series data, we argue that the current methods are not sufficient to learn optimal time series representations due to solely unidirectional encoding over sparse point-wise input units. In this work, we propose TimeMAE, a novel self-supervised paradigm for learning transferrable time series representations based on transformer networks. The distinct characteristics of the TimeMAE lie in processing each time series into a sequence of non-overlapping sub-series via window-slicing partitioning, followed by random masking strategies over the semantic units of localized sub-series. Such a simple yet effective setting can help us achieve the goal of killing three birds with one stone, i.e., (1) learning enriched contextual representations of time series with a bidirectional encoding scheme; (2) increasing the information density of basic semantic units; (3) efficiently encoding representations of time series using transformer networks. Nevertheless, it is a non-trivial to perform reconstructing task over such a novel formulated modeling paradigm. To solve the discrepancy issue incurred by newly injected masked embeddings, we design a decoupled autoencoder architecture, which learns the representations of visible (unmasked) positions and masked ones with two different encoder modules, respectively. Furthermore, we construct two types of informative targets to accomplish the corresponding pretext tasks. One is to create a tokenizer module that assigns a codeword to each masked region, allowing the masked codeword classification (MCC) task to be completed effectively. Another one is to adopt a siamese network structure to generate target representations for each masked input unit, aiming at performing the masked representation regression (MRR) optimization. Comprehensively pre-trained, our model can efficiently learn transferrable time series representations, thus benefiting the classification of time series. We extensively perform experiments on five benchmark datasets to verify the effectiveness of the TimeMAE. Experimental results show that the TimeMAE can significantly surpass previous competitive baselines. Furthermore, we also demonstrate the universality of the learned representations by performing transfer learning experiments. For the reproducibility of our results, we make our experiment codes publicly available to facilitate the self-supervised representations of time series in https://github.com/Mingyue-Cheng/TimeMAE.

Visual Autoregressive Modeling for Instruction-Guided Image Editing

Recent advances in diffusion models have brought remarkable visual fidelity to instruction-guided image editing. However, their global denoising process inherently entangles the edited region with the entire image context, leading to unintended spurious modifications and compromised adherence to editing instructions. In contrast, autoregressive models offer a distinct paradigm by formulating image synthesis as a sequential process over discrete visual tokens. Their causal and compositional mechanism naturally circumvents the adherence challenges of diffusion-based methods. In this paper, we present VAREdit, a visual autoregressive (VAR) framework that reframes image editing as a next-scale prediction problem. Conditioned on source image features and text instructions, VAREdit generates multi-scale target features to achieve precise edits. A core challenge in this paradigm is how to effectively condition the source image tokens. We observe that finest-scale source features cannot effectively guide the prediction of coarser target features. To bridge this gap, we introduce a Scale-Aligned Reference (SAR) module, which injects scale-matched conditioning information into the first self-attention layer. VAREdit demonstrates significant advancements in both editing adherence and efficiency. On EMU-Edit and PIE-Bench benchmarks, VAREdit outperforms leading diffusion-based methods by a substantial margin in terms of both CLIP and GPT scores. Moreover, VAREdit completes a 512x512 editing in 1.2 seconds, making it 2.2x faster than the similarly sized UltraEdit. Code is available at: https://github.com/HiDream-ai/VAREdit.

HoH: A Dynamic Benchmark for Evaluating the Impact of Outdated Information on Retrieval-Augmented Generation

While Retrieval-Augmented Generation (RAG) has emerged as an effective approach for addressing the knowledge outdating problem in Large Language Models (LLMs), it still faces a critical challenge: the prevalence of outdated information in knowledge bases. Current research primarily focuses on incorporating up-to-date information, yet the impact of outdated information coexisting in retrieval sources remains inadequately addressed. To bridge this gap, we introduce HoH, the first benchmark specifically designed to evaluate the impact of outdated information on RAG. Our benchmark leverages token-level diff algorithms combined with LLM pipelines to efficiently create a large-scale QA dataset that accurately captures the evolution of temporal knowledge in real-world facts.Through comprehensive experiments, we reveal that outdated information significantly degrades RAG performance in two critical ways: (1) it substantially reduces response accuracy by distracting models from correct information, and (2) it can mislead models into generating potentially harmful outputs, even when current information is available. Current RAG approaches struggle with both retrieval and generation aspects when handling outdated information. These findings highlight the urgent need for innovative solutions to address the temporal challenges in RAG.

Fewer Battles, More Gain: An Information-Efficient Framework for Arena-based LLM Evaluation

Arena-based evaluation has become a key method for assessing large language models (LLMs) through head-to-head model comparisons, closely reflecting human preferences. However, current arena rating systems (e.g., ELO rating system) often suffer from inefficiencies due to exhaustive or random model pair annotations, leading to redundant evaluations, longer evaluation times, and lower overall efficiency. To address these challenges, we propose a novel adaptive model-pair selection algorithm. By leveraging the asymptotic normality of LLM ability estimation under sparse conditions, our approach strategically selects high-value model pairs, focusing on confrontations with the lowest variance. Specifically, we introduce Fisher information as a metric to guide model pair selection, optimizing the evaluation process through A-optimality and D-optimality. A-optimality minimizes estimation variance, ensuring balanced reliability across models, while D-optimality reduces uncertainty by maximizing the determinant of the Fisher Information Matrix. Extensive experiments on both simulated and real-world datasets demonstrate that our method outperforms existing approaches in terms of information efficiency and result reliability. Notably, our method offers a flexible, general toolkit that can be easily integrated into existing arena-based platforms, greatly improving scalability and efficiency for large-scale LLM evaluations.