Published on

Natural Language Processing - CPSC 436N (UBC)

Authors
  • avatar
    Name
    Christina Yang
    Twitter

CPSC 436N (2023W) topics Information source: The University of British Columbia

Table of Contents

  1. Models
  2. Topics
  3. Chunking

count-based trigram, neural trigram, or RNN-based model

Question: Does Planda require NLP? YES I am parsing course syllabuses. Currently relying on ChatGPT, wonder how hard it would be to make something tailored to syllabus parsing

I'm thinking more clustering, or some partitioning algorithm

Models

n-gram (count-based): only looks at n-1 words of context

RNN: Good for long-range dependencies. The RNN doesn’t follow the Markov assumption, and the hidden state at each time step captures the entire history

greedy decoding (which is equivalent to top k with k = 1):

  • predicts the most likely word at each timestep,
  • typically generates boring and repetitive sentences with very common words.

sampling from the entire distribution (which is equivalent to top k with k = |V|).

  • randomly samples a word from the vocabulary proportionally to its probability in the next token distribution.
  • more diversity in the generated sentences than greedy decoding, it puts too much of the probability mass on the “long tail” of the distribution and may generate rare words.

top k with a relatively small k:

  • compromise between greedy decoding and sampling from the entire distribution
  • It first prunes the distribution to the most likely k words, then renormalizes it and samples a word from the pruned distribution proportionally to its probability.
  • It yields higher quality and moderately diverse sentences (depending on the value of k).
Topics

Finite State Text Processing, Morphology, Pynini

  • English Morphology
  • FSA and Morphology
  • Finite State Transducers (FST) and Morphological Parsing/Gen.

FSA, Reg Expressions, Conditional Prob., Bayes Text Normalization, Pynini and Spell Checking

  • Recap: FST + Morphology
  • Text Normalization: Tokenization, Lemmatization, Stemming, etc.
  • Probabilistic Models
  • Dealing with spelling errors
  • Noisy channel model
  • Bayes rule applied to Noisy channel model

Language Models: Traditional vs Neural

  • N-gram language models (count-based)
  • Neural language models
  • Language model Evaluation (time permitting)

Text Classification - Traditional Methods (Naive Bayes and Logistic Regression)

  • Traditional (non-neural) supervised approaches – Naïve Bayes – Relation to LM – Logistic Regression

Feedforward Neural NetworksLinks to an external site. (MLP)

Text Classification - Neural Methods (MLP and CNN)

  • MLP classifier
  • CNN classifier

Sequence labeling: Markov Models - POS tagging and NER

  • Sequence Labeling
  • Markov Models (Chain and HMMs)
  • Fundamental inferences for HMMs and similar models
  • Two key sample Sequence Labeling NLP tasks – Part of Speech Tagging (POS) – Named Entity Recognition (NER)

Sequence labeling: RNNs, LSTMs

  • “Older” Neural Models
  • Recurrent Neural Networks (RNN):
  • Language Modelling
  • Sequence labeling
  • Text Classification
  • Long Short-Term Memory Networks (LSTMs) and other advanced RNNs

Sequence-to-Sequence: Encoder-Decoder, Attention

  • Encoder-Decoder
    • Motivation
    • Basic design
  • Attention Mechanism
  • Inference (Beam Search)
  • Training

Transformers

  • Self-Attention (key, query, value)
  • Multi-head and Position
  • Transformer Block
  • Decoder

Text Classification (BOW, CBOW, transformers).

Pre-trained language models

  • Contextual Embeddings
  • BERT (Bidirectional Encoder Representations from Transformers)
    • Architecture
    • Training
    • Complexity

Intro to syntax, Context Free Grammars and Parsing

  • Introduction to syntax
  • Context-free grammars
  • Parsing and syntactic ambiguity

Chunking, Dependency Parsing, Treebanks

  • Partial Constituency Parsing: Chunking
  • Heads in Parse Trees
  • Dependency Grammars and Parsing
  • Treebank

Sequence Modeling

Dependency/Constituency Parsing PCFG Traditional CKY + Neural Models

  • Probabilistic Context Free Grammars (PCFG)
  • Statistical Parsing: Probabilistic CKY
  • Neural Constituency Parsing
  • Neural Dependency Parsing

Intro Semantics

  • What is meaning is and how to represent it
  • Mapping sentences into meaning representations
  • Semantic Parsing

Semantic Role Labeling

  • Semantic Roles
  • Resources
  • Semantic Role Labeling (SRL)
  • Applications of SRL

Lexical Semantics

  • Semantic relations between words
  • Lexical Resources:
    • WordNet
    • Other ontologies
  • Word Sense Disambiguation
  • Semantic Similarity

Topic Modeling word embeddings

  • Distributional count-based vectors
  • Term Frequency – Inverse Document frequency (TF-IDF)
  • Pointwise Mutual Information (PMI)
  • Topic Modelling
  • Corpus-Scale Topic Modeling (LDA)

Discourse and Coreference

  • Discourse theories of coherence relational structures (RST and PDTB)
  • Discourse Parsing
  • Coreference Resolution
  • Entity coreference resolution
  • Event coreference resolution

Summarization

  • Introduction
  • Evaluation
  • Simple Unsupervised Methods
  • Neural Supervised Methods

Advanced topics in NLP (prompting, commonsense, ethics)

  • Commonsense Reasoning in NLP
  • Ethics in NLP
  • Prompting

Chunking

basic non-recursive phrases Noun Phrases (NP), Verb Phrases (VP), and Prepositional Phrases (PP) [NP The HD box] that [NP you] [VP ordered] [PP from] [NP Shaw] [VP never arrived]

Machine Learning Approach to Chunking

  • A case of sequential classification
  • BIO tagging: (B) beginning, (I) internal, (O) outside,
  • Internal and Beginning for each chunk type => size of tagset (2n + 1) where n is the num of chunk types

Dependency Grammars

  • Syntactic structure: binary relations between words
  • Links: grammatical function or very general semantic relation
  • Directed labeled arcs from heads to dependents
  • Good approx. of semantic relations
  • Parsing can be framed as classification!
  • Output can play a role in many NLP applications (for text classification, summarization and NLG)
  • Error analysis / Interpretability of neural systems...

Dependency approach vs. CFG parsing.

  • Deals well with free word order languages where the constituent structure is quite fluid
  • Parsing is much faster than CFG-based parsers
  • Dependency structure often captures all the syntactic relations actually needed by later applications

There are two modern approaches to dependency parsing (supervised learning from Treebank data)

  • Graph / Optimization-based approach: Find Minimum spanning tree that best matches some criteria [McDonald, 2005]
  • Greedy Transition-based approach: define and learn a transition system for mapping a sentence to its dependency graph (e.g. MaltParser).

Transition-Based Dependency Parsing The basic idea:

  • Define a transition system for dependency parsing
  • Train a classifier for predicting the next transition
  • Use the classifier to do parsing as greedy, deterministic search
  • Advantages:
  • Efficient parsing (linear time complexity)
  • Robust disambiguation (discriminative classifiers)
Create an ecard at CelebrateThisMortal.comBe more productive with Planda