TOC
Model-free, Model-based and General Intelligence
Reconcoiling Deep Learning with Symbolic Artificial Intelligence: Representing Objects and Relations
Learning First-order Symbolic Representation for Planning from the Structure of States Space
From Statistical Relational to Neuro-Symbolic Artificial Intelligence
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
CLEVRER: COLLISION EVENTS FOR VIDEO REPRESENTATION AND REASONING

Model-free, Model-based and General Intelligence

Background

Learners such as ML or DL infers behaviors or function from data and experience
Solvers such as SAT, classicial planning, Bayesian network and POMDPs

Open Challenges

Learner -> big success, but black box
Solver -> require models and scalable algorithms

Main Contributions

Review the gaps between model-free and model-based approches
Learner and solvers: contract:
1. Learner: need training, which is slow, then fast; can only learn from experience (intuition mind for system I in psychology)
2. Solvers: no training, slow to inference; can solve a problem from scratch provided representation of problems(analytical mind for system II in psychology)
Learners and solvers: challenges:

Model Learning: Learning models from stream of action and partial observation remains challenging

Learning the relevant variables, learn features that are reusable

Learn finite-size abstract representation -> representation learning for planning

Reconcoiling Deep Learning with Symbolic Artificial Intelligence: Representing Objects and Relations

Background

Symbolic artifical intellgence was dominant in 20th century, while a connectionist paradigm become popular now. But they both have advantages and drawbacks. A significant challenge in this field is to effect a reconciliation.

To achieve this goal, the objective of deep learning is to develop structure to discover objects and relations from the data, and learn how to represent them in ways that are useful for the downstream task. This review highlight the recent pregross.

Focus on the question: How the deep learning can acquire compositional representation whose elements are objects and relations

Focus on the folllowing topics:

compositionality
learning representations composed of objects and relations

Drackbacks	Deep learning	Symbolic AI
Data inefficiency		symbolic representations can be reuse
Lack of generalization		Symbolic representations are high-level and abstract
Lack of interpretation		Language-like -> easy to understand
representations are handcrafted	discover features from high-dimensition data without human intervetion

Objects and Compositionality

In symbol AI, representations conform a priciple of compositionality to the extent that the denotation of a representations is a function of the denotiions of its parts and the way those part are combined.

The representations most of deep learning methods got are distributed, whose element has little or no meaning in isolation. Those factors are entangled.
A vallina autoencoder can learn entangled features; while a variational autoencoder can learn disentangled features
The learned disentangled feature is benfical to downstream tasks such as reinforcement learning and learning hierarchical concepts (To be researched)

Relational Representations

How to extract relational representation:

Relation network: concate object pairs and pass them into a MLP
Self-attention: compute an additional attention weights for each pair

Learning First-order Symbolic Representation for Planning from the Structure of States Space

Background & Challenges

The obstacles for developing flexible AI is the split between data-based learner and model-based solver. This work tried to get a complete first-order representation (i.e. general actions schemas, relations symble, and objects) from non-symbolic input (graph that encode the structure of state space) that encode the structure of the state space

Method

The inference problem is cast as a two-level combinatorial search where the outer level looks for the right value of the hyperparameters and the inner level, formulated and solved via SAT, looks for a fifirst-order representation that fifits the hyperparameters and explains the input graphs.

Input: labelled directed graph
Ouput: PDDL-like representation

Experiment & Result

Correct and general first-order models for domains like Gripper, Blocksworld, and Hanoi are shown to be learned from graphs that encode the flflat state-space structure of a single small instance.

From Statistical Relational to Neuro-Symbolic Artificial Intelligence

Background

The intergration of learning and reasoning is one of the key challenge in AI and ML. Neuro-symbolic and statistical relational artificial intelligence both intergrate framework for learning with logic reasoning. This work identified seven paralls across seven dimension between these two fields. This survey limit itself to logic and probabilistic perspective.

Neuro-symbolic: incorporate symbolic learning into neural network
Statistical relational AI: incroporate logic with probabilistic graphical model

Comparision (TO DO)

Dimension	StarAI	NeSy
Directed vs undirected
Grounding vs proofing
Logic vs probability vs neural
Semantic
Learning paramters vs structure
Symbol vs sub-symbol
Type of logic

Open challenges for NeSy

Probabilistic reasoning: the application of probabilistic reasoning in neural-symbol approach
Structure learning
Scaling inference
Data efficiency
Symbolic representation learning

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

TODO

symbolic program executor
Details of REINFORCE
1. how to set the output of question parser
2. unsupervised or weakly supervised representation learning

Problem

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

Prevous & Current Methods

Two seperate approaches:

Deep learning
symbolic approach

Current

Current Approach, three components:

a scene parser (de-renderer)
a question parser (program generator)
and a program executor.

Result & Conclusion & Insight

Three advantages:

First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset.
Second, the model is more data- and memory-effificient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offlfline question answering.
Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.

Limitation & Potential Direction

However, building structured representations for scenes and sentence meanings—the targets of these mappings—in ways that generalize to truly novel situations remains a challenge for many approaches including ours.

Recent progress on unsupervised or weakly supervised representation learning, in both language and vision, offers some promise of generalization. Integrating this work with our neural-symbolic approach to visually grounded language is a promising future

CLEVRER: COLLISION EVENTS FOR VIDEO REPRESENTATION AND REASONING

TODO

Dynamics predictor
symbolic program executor
First limitation

Problem

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

CLEVRER includes four types of question:

descriptive (e.g., ‘what color’)
explanatory (‘what’s responsible for’)
predictive (‘what will happen next’)
counterfactual (‘what if’).

While these state-of-the-art models for visual reasoning thrive on the perceptionbased task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual)

Prevous & Current Methods

Three previosu research directions

video understanding
visual question answering
physical and causal reasoning

Current

Our model Neuro-Symbolic Dynamic Reasoning (NS-DR) combines neural nets for pattern recognition anddynamics prediction, and symbolic logic for causal reasoning.

a video frame parser
a neural dynamics predictor
a question parser
a program executor

Result &Conclusion & Insight

Contributions

First, NS-DR incorporates a dynamics planner into the visual reasoning task
Second, symbolic representation provides a powerful common ground for vision, language, dynamics and causality.

Limitation & Possible Direction

training of the video and question parser relies on extra supervisions such as object masks, attributes, and quesion programs. Even though the amount of data required for training is minimal compared to end-to-end approaches (i.e. thou sands of annotated frames and programs), these data is hard to acquire in real-world applications. This constraint could be relaxed by applying unsupervised/weaklysupervised methods for scene decomposition and concept discovery Second, our model performance decreases on tasks that require long-term dynamics prediction such as the counterfactual questions. This suggests that we need a better dynamics model capable of generating more stable and accurate trajectories.

Learner and Solver

The Way to General Intelligence ?

TOC

Model-free, Model-based and General Intelligence

Background

Open Challenges

Main Contributions

Reconcoiling Deep Learning with Symbolic Artificial Intelligence: Representing Objects and Relations

Background

Objects and Compositionality

Relational Representations

Learning First-order Symbolic Representation for Planning from the Structure of States Space

Background & Challenges

Method

Experiment & Result

From Statistical Relational to Neuro-Symbolic Artificial Intelligence

Background

Comparision (TO DO)

Open challenges for NeSy

Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding

TODO

Problem

Prevous & Current Methods

Previous

Current

Result & Conclusion & Insight

Limitation & Potential Direction

CLEVRER: COLLISION EVENTS FOR VIDEO REPRESENTATION AND REASONING

TODO

Problem

Prevous & Current Methods

Previous

Current

Result &Conclusion & Insight

Limitation & Possible Direction

CATALOG

FEATURED TAGS