TOC
- TOC
- Model-free, Model-based and General Intelligence
- Reconcoiling Deep Learning with Symbolic Artificial Intelligence: Representing Objects and Relations
- Learning First-order Symbolic Representation for Planning from the Structure of States Space
- From Statistical Relational to Neuro-Symbolic Artificial Intelligence
- Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
- CLEVRER: COLLISION EVENTS FOR VIDEO REPRESENTATION AND REASONING
Model-free, Model-based and General Intelligence
Background
- Learners such as ML or DL infers behaviors or function from data and experience
- Solvers such as SAT, classicial planning, Bayesian network and POMDPs
Open Challenges
- Learner -> big success, but black box
- Solver -> require models and scalable algorithms
Main Contributions
- Review the gaps between model-free and model-based approches
- Learner and solvers: contract:
- Learner: need training, which is slow, then fast; can only learn from experience (intuition mind for system I in psychology)
- Solvers: no training, slow to inference; can solve a problem from scratch provided representation of problems(analytical mind for system II in psychology)
- Learners and solvers: challenges:
Model Learning: Learning models from stream of action and partial observation remains challenging
Learning the relevant variables, learn features that are reusable
Learn finite-size abstract representation -> representation learning for planning
Reconcoiling Deep Learning with Symbolic Artificial Intelligence: Representing Objects and Relations
Background
Symbolic artifical intellgence was dominant in 20th century, while a connectionist paradigm become popular now. But they both have advantages and drawbacks. A significant challenge in this field is to effect a reconciliation.
To achieve this goal, the objective of deep learning is to develop structure to discover objects and relations from the data, and learn how to represent them in ways that are useful for the downstream task. This review highlight the recent pregross.
Focus on the question: How the deep learning can acquire compositional representation whose elements are objects and relations
Focus on the folllowing topics:
- compositionality
- learning representations composed of objects and relations
Drackbacks | Deep learning | Symbolic AI |
---|---|---|
Data inefficiency | symbolic representations can be reuse | |
Lack of generalization | Symbolic representations are high-level and abstract | |
Lack of interpretation | Language-like -> easy to understand | |
representations are handcrafted | discover features from high-dimensition data without human intervetion |
Objects and Compositionality
In symbol AI, representations conform a priciple of compositionality to the extent that the denotation of a representations is a function of the denotiions of its parts and the way those part are combined.
- The representations most of deep learning methods got are distributed, whose element has little or no meaning in isolation. Those factors are entangled.
- A vallina autoencoder can learn entangled features; while a variational autoencoder can learn disentangled features
- The learned disentangled feature is benfical to downstream tasks such as reinforcement learning and learning hierarchical concepts (To be researched)
Relational Representations
How to extract relational representation:
- Relation network: concate object pairs and pass them into a MLP
- Self-attention: compute an additional attention weights for each pair
Learning First-order Symbolic Representation for Planning from the Structure of States Space
Background & Challenges
The obstacles for developing flexible AI is the split between data-based learner and model-based solver. This work tried to get a complete first-order representation (i.e. general actions schemas, relations symble, and objects) from non-symbolic input (graph that encode the structure of state space) that encode the structure of the state space
Method
The inference problem is cast as a two-level combinatorial search where the outer level looks for the right value of the hyperparameters and the inner level, formulated and solved via SAT, looks for a fifirst-order representation that fifits the hyperparameters and explains the input graphs.
- Input: labelled directed graph
- Ouput: PDDL-like representation
Experiment & Result
Correct and general first-order models for domains like Gripper, Blocksworld, and Hanoi are shown to be learned from graphs that encode the flflat state-space structure of a single small instance.
From Statistical Relational to Neuro-Symbolic Artificial Intelligence
Background
The intergration of learning and reasoning is one of the key challenge in AI and ML. Neuro-symbolic and statistical relational artificial intelligence both intergrate framework for learning with logic reasoning. This work identified seven paralls across seven dimension between these two fields. This survey limit itself to logic and probabilistic perspective.
- Neuro-symbolic: incorporate symbolic learning into neural network
- Statistical relational AI: incroporate logic with probabilistic graphical model
Comparision (TO DO)
Dimension | StarAI | NeSy |
---|---|---|
Directed vs undirected | ||
Grounding vs proofing | ||
Logic vs probability vs neural | ||
Semantic | ||
Learning paramters vs structure | ||
Symbol vs sub-symbol | ||
Type of logic |
Open challenges for NeSy
- Probabilistic reasoning: the application of probabilistic reasoning in neural-symbol approach
- Structure learning
- Scaling inference
- Data efficiency
- Symbolic representation learning
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language Understanding
TODO
- symbolic program executor
- Details of REINFORCE
- how to set the output of question parser
- unsupervised or weakly supervised representation learning
Problem
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning
Prevous & Current Methods
Previous
Two seperate approaches:
- Deep learning
- symbolic approach
Current
Current Approach, three components:
- a scene parser (de-renderer)
- a question parser (program generator)
- and a program executor.
Result & Conclusion & Insight
Three advantages:
-
First, executing programs on a symbolic space is more robust to long program traces; our model can solve complex reasoning tasks better, achieving an accuracy of 99.8% on the CLEVR dataset.
-
Second, the model is more data- and memory-effificient: it performs well after learning on a small number of training data; it can also encode an image into a compact representation, requiring less storage than existing methods for offlfline question answering.
-
Third, symbolic program execution offers full transparency to the reasoning process; we are thus able to interpret and diagnose each execution step.
Limitation & Potential Direction
However, building structured representations for scenes and sentence meanings—the targets of these mappings—in ways that generalize to truly novel situations remains a challenge for many approaches including ours.
Recent progress on unsupervised or weakly supervised representation learning, in both language and vision, offers some promise of generalization. Integrating this work with our neural-symbolic approach to visually grounded language is a promising future
CLEVRER: COLLISION EVENTS FOR VIDEO REPRESENTATION AND REASONING
TODO
- Dynamics predictor
- symbolic program executor
- First limitation
Problem
CLEVRER: CoLlision Events for Video REpresentation and Reasoning
CLEVRER includes four types of question:
- descriptive (e.g., ‘what color’)
- explanatory (‘what’s responsible for’)
- predictive (‘what will happen next’)
- counterfactual (‘what if’).
While these state-of-the-art models for visual reasoning thrive on the perceptionbased task (descriptive), they perform poorly on the causal tasks (explanatory, predictive and counterfactual)
Prevous & Current Methods
Previous
Three previosu research directions
- video understanding
- visual question answering
- physical and causal reasoning
Current
Our model Neuro-Symbolic Dynamic Reasoning (NS-DR) combines neural nets for pattern recognition anddynamics prediction, and symbolic logic for causal reasoning.
- a video frame parser
- a neural dynamics predictor
- a question parser
- a program executor
Result &Conclusion & Insight
Contributions
- First, NS-DR incorporates a dynamics planner into the visual reasoning task
- Second, symbolic representation provides a powerful common ground for vision, language, dynamics and causality.
Limitation & Possible Direction
training of the video and question parser relies on extra supervisions such as object masks, attributes, and quesion programs. Even though the amount of data required for training is minimal compared to end-to-end approaches (i.e. thou sands of annotated frames and programs), these data is hard to acquire in real-world applications. This constraint could be relaxed by applying unsupervised/weaklysupervised methods for scene decomposition and concept discovery Second, our model performance decreases on tasks that require long-term dynamics prediction such as the counterfactual questions. This suggests that we need a better dynamics model capable of generating more stable and accurate trajectories.