Natural Language Processing - CPSC 436N (UBC)

CPSC 436N (2023W) topics Information source: The University of British Columbia

Models
Topics
Chunking

count-based trigram, neural trigram, or RNN-based model

Question: Does Planda require NLP? YES I am parsing course syllabuses. Currently relying on ChatGPT, wonder how hard it would be to make something tailored to syllabus parsing

I'm thinking more clustering, or some partitioning algorithm

Models

n-gram (count-based): only looks at n-1 words of context

RNN: Good for long-range dependencies. The RNN doesn’t follow the Markov assumption, and the hidden state at each time step captures the entire history

greedy decoding (which is equivalent to top k with k = 1):

predicts the most likely word at each timestep,
typically generates boring and repetitive sentences with very common words.

sampling from the entire distribution (which is equivalent to top k with k = |V|).

randomly samples a word from the vocabulary proportionally to its probability in the next token distribution.
more diversity in the generated sentences than greedy decoding, it puts too much of the probability mass on the “long tail” of the distribution and may generate rare words.

top k with a relatively small k:

compromise between greedy decoding and sampling from the entire distribution
It first prunes the distribution to the most likely k words, then renormalizes it and samples a word from the pruned distribution proportionally to its probability.
It yields higher quality and moderately diverse sentences (depending on the value of k).

Topics

Finite State Text Processing, Morphology, Pynini

English Morphology
FSA and Morphology
Finite State Transducers (FST) and Morphological Parsing/Gen.

FSA, Reg Expressions, Conditional Prob., Bayes Text Normalization, Pynini and Spell Checking

Recap: FST + Morphology
Text Normalization: Tokenization, Lemmatization, Stemming, etc.
Probabilistic Models
Dealing with spelling errors
Noisy channel model
Bayes rule applied to Noisy channel model

Language Models: Traditional vs Neural

N-gram language models (count-based)
Neural language models
Language model Evaluation (time permitting)

Text Classification - Traditional Methods (Naive Bayes and Logistic Regression)

Traditional (non-neural) supervised approaches – Naïve Bayes – Relation to LM – Logistic Regression

Feedforward Neural NetworksLinks to an external site. (MLP)

Text Classification - Neural Methods (MLP and CNN)

MLP classifier
CNN classifier

Sequence labeling: Markov Models - POS tagging and NER

Sequence Labeling
Markov Models (Chain and HMMs)
Fundamental inferences for HMMs and similar models
Two key sample Sequence Labeling NLP tasks – Part of Speech Tagging (POS) – Named Entity Recognition (NER)

Sequence labeling: RNNs, LSTMs

“Older” Neural Models
Recurrent Neural Networks (RNN):
Language Modelling
Sequence labeling
Text Classification
Long Short-Term Memory Networks (LSTMs) and other advanced RNNs

Sequence-to-Sequence: Encoder-Decoder, Attention

Encoder-Decoder
- Motivation
- Basic design
Attention Mechanism
Inference (Beam Search)
Training

Transformers

Self-Attention (key, query, value)
Multi-head and Position
Transformer Block
Decoder

Text Classification (BOW, CBOW, transformers).

Pre-trained language models

Contextual Embeddings
BERT (Bidirectional Encoder Representations from Transformers)
- Architecture
- Training
- Complexity

Intro to syntax, Context Free Grammars and Parsing

Introduction to syntax
Context-free grammars
Parsing and syntactic ambiguity

Chunking, Dependency Parsing, Treebanks

Partial Constituency Parsing: Chunking
Heads in Parse Trees
Dependency Grammars and Parsing
Treebank

Sequence Modeling

Dependency/Constituency Parsing PCFG Traditional CKY + Neural Models

Probabilistic Context Free Grammars (PCFG)
Statistical Parsing: Probabilistic CKY
Neural Constituency Parsing
Neural Dependency Parsing

Intro Semantics

What is meaning is and how to represent it
Mapping sentences into meaning representations
Semantic Parsing

Semantic Role Labeling

Semantic Roles
Resources
Semantic Role Labeling (SRL)
Applications of SRL

Lexical Semantics

Semantic relations between words
Lexical Resources:
- WordNet
- Other ontologies
Word Sense Disambiguation
Semantic Similarity

Topic Modeling word embeddings

Distributional count-based vectors
Term Frequency – Inverse Document frequency (TF-IDF)
Pointwise Mutual Information (PMI)
Topic Modelling
Corpus-Scale Topic Modeling (LDA)

Discourse and Coreference

Discourse theories of coherence relational structures (RST and PDTB)
Discourse Parsing
Coreference Resolution
Entity coreference resolution
Event coreference resolution

Summarization

Introduction
Evaluation
Simple Unsupervised Methods
Neural Supervised Methods

Advanced topics in NLP (prompting, commonsense, ethics)

Commonsense Reasoning in NLP
Ethics in NLP
Prompting

Chunking

basic non-recursive phrases Noun Phrases (NP), Verb Phrases (VP), and Prepositional Phrases (PP) [NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived]

Machine Learning Approach to Chunking

A case of sequential classification
BIO tagging: (B) beginning, (I) internal, (O) outside,
Internal and Beginning for each chunk type => size of tagset (2n + 1) where n is the num of chunk types

Dependency Grammars

Syntactic structure: binary relations between words
Links: grammatical function or very general semantic relation
Directed labeled arcs from heads to dependents
Good approx. of semantic relations
Parsing can be framed as classification!
Output can play a role in many NLP applications (for text classification, summarization and NLG)
Error analysis / Interpretability of neural systems...

Dependency approach vs. CFG parsing.

Deals well with free word order languages where the constituent structure is quite fluid
Parsing is much faster than CFG-based parsers
Dependency structure often captures all the syntactic relations actually needed by later applications

There are two modern approaches to dependency parsing (supervised learning from Treebank data)

Graph / Optimization-based approach: Find Minimum spanning tree that best matches some criteria [McDonald, 2005]
Greedy Transition-based approach: define and learn a transition system for mapping a sentence to its dependency graph (e.g. MaltParser).

Transition-Based Dependency Parsing The basic idea:

Define a transition system for dependency parsing
Train a classifier for predicting the next transition
Use the classifier to do parsing as greedy, deterministic search
Advantages:
Efficient parsing (linear time complexity)
Robust disambiguation (discriminative classifiers)

Table of Contents

Models

Topics

Chunking