- Published on
Natural Language Processing - CPSC 436N (UBC)
- Authors
- Name
- Christina Yang
CPSC 436N (2023W) topics Information source: The University of British Columbia
Table of Contents
count-based trigram, neural trigram, or RNN-based model
Question: Does Planda require NLP? YES I am parsing course syllabuses. Currently relying on ChatGPT, wonder how hard it would be to make something tailored to syllabus parsing
I'm thinking more clustering, or some partitioning algorithm
Models
n-gram (count-based): only looks at n-1 words of context
RNN: Good for long-range dependencies. The RNN doesn’t follow the Markov assumption, and the hidden state at each time step captures the entire history
greedy decoding (which is equivalent to top k with k = 1):
- predicts the most likely word at each timestep,
- typically generates boring and repetitive sentences with very common words.
sampling from the entire distribution (which is equivalent to top k with k = |V|).
- randomly samples a word from the vocabulary proportionally to its probability in the next token distribution.
- more diversity in the generated sentences than greedy decoding, it puts too much of the probability mass on the “long tail” of the distribution and may generate rare words.
top k with a relatively small k:
- compromise between greedy decoding and sampling from the entire distribution
- It first prunes the distribution to the most likely k words, then renormalizes it and samples a word from the pruned distribution proportionally to its probability.
- It yields higher quality and moderately diverse sentences (depending on the value of k).
Topics
Finite State Text Processing, Morphology, Pynini
- English Morphology
- FSA and Morphology
- Finite State Transducers (FST) and Morphological Parsing/Gen.
FSA, Reg Expressions, Conditional Prob., Bayes Text Normalization, Pynini and Spell Checking
- Recap: FST + Morphology
- Text Normalization: Tokenization, Lemmatization, Stemming, etc.
- Probabilistic Models
- Dealing with spelling errors
- Noisy channel model
- Bayes rule applied to Noisy channel model
Language Models: Traditional vs Neural
- N-gram language models (count-based)
- Neural language models
- Language model Evaluation (time permitting)
Text Classification - Traditional Methods (Naive Bayes and Logistic Regression)
- Traditional (non-neural) supervised approaches – Naïve Bayes – Relation to LM – Logistic Regression
Feedforward Neural NetworksLinks to an external site. (MLP)
Text Classification - Neural Methods (MLP and CNN)
- MLP classifier
- CNN classifier
Sequence labeling: Markov Models - POS tagging and NER
- Sequence Labeling
- Markov Models (Chain and HMMs)
- Fundamental inferences for HMMs and similar models
- Two key sample Sequence Labeling NLP tasks – Part of Speech Tagging (POS) – Named Entity Recognition (NER)
Sequence labeling: RNNs, LSTMs
- “Older” Neural Models
- Recurrent Neural Networks (RNN):
- Language Modelling
- Sequence labeling
- Text Classification
- Long Short-Term Memory Networks (LSTMs) and other advanced RNNs
Sequence-to-Sequence: Encoder-Decoder, Attention
- Encoder-Decoder
- Motivation
- Basic design
- Attention Mechanism
- Inference (Beam Search)
- Training
Transformers
- Self-Attention (key, query, value)
- Multi-head and Position
- Transformer Block
- Decoder
Text Classification (BOW, CBOW, transformers).
Pre-trained language models
- Contextual Embeddings
- BERT (Bidirectional Encoder Representations from Transformers)
- Architecture
- Training
- Complexity
Intro to syntax, Context Free Grammars and Parsing
- Introduction to syntax
- Context-free grammars
- Parsing and syntactic ambiguity
Chunking, Dependency Parsing, Treebanks
- Partial Constituency Parsing: Chunking
- Heads in Parse Trees
- Dependency Grammars and Parsing
- Treebank
Sequence Modeling
Dependency/Constituency Parsing PCFG Traditional CKY + Neural Models
- Probabilistic Context Free Grammars (PCFG)
- Statistical Parsing: Probabilistic CKY
- Neural Constituency Parsing
- Neural Dependency Parsing
Intro Semantics
- What is meaning is and how to represent it
- Mapping sentences into meaning representations
- Semantic Parsing
Semantic Role Labeling
- Semantic Roles
- Resources
- Semantic Role Labeling (SRL)
- Applications of SRL
Lexical Semantics
- Semantic relations between words
- Lexical Resources:
- WordNet
- Other ontologies
- Word Sense Disambiguation
- Semantic Similarity
Topic Modeling word embeddings
- Distributional count-based vectors
- Term Frequency – Inverse Document frequency (TF-IDF)
- Pointwise Mutual Information (PMI)
- Topic Modelling
- Corpus-Scale Topic Modeling (LDA)
Discourse and Coreference
- Discourse theories of coherence relational structures (RST and PDTB)
- Discourse Parsing
- Coreference Resolution
- Entity coreference resolution
- Event coreference resolution
Summarization
- Introduction
- Evaluation
- Simple Unsupervised Methods
- Neural Supervised Methods
Advanced topics in NLP (prompting, commonsense, ethics)
- Commonsense Reasoning in NLP
- Ethics in NLP
- Prompting
Chunking
basic non-recursive phrases Noun Phrases (NP), Verb Phrases (VP), and Prepositional Phrases (PP) [NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived]
Machine Learning Approach to Chunking
- A case of sequential classification
- BIO tagging: (B) beginning, (I) internal, (O) outside,
- Internal and Beginning for each chunk type => size of tagset (2n + 1) where n is the num of chunk types
Dependency Grammars
- Syntactic structure: binary relations between words
- Links: grammatical function or very general semantic relation
- Directed labeled arcs from heads to dependents
- Good approx. of semantic relations
- Parsing can be framed as classification!
- Output can play a role in many NLP applications (for text classification, summarization and NLG)
- Error analysis / Interpretability of neural systems...
Dependency approach vs. CFG parsing.
- Deals well with free word order languages where the constituent structure is quite fluid
- Parsing is much faster than CFG-based parsers
- Dependency structure often captures all the syntactic relations actually needed by later applications
There are two modern approaches to dependency parsing (supervised learning from Treebank data)
- Graph / Optimization-based approach: Find Minimum spanning tree that best matches some criteria [McDonald, 2005]
- Greedy Transition-based approach: define and learn a transition system for mapping a sentence to its dependency graph (e.g. MaltParser).
Transition-Based Dependency Parsing The basic idea:
- Define a transition system for dependency parsing
- Train a classifier for predicting the next transition
- Use the classifier to do parsing as greedy, deterministic search
- Advantages:
- Efficient parsing (linear time complexity)
- Robust disambiguation (discriminative classifiers)