CS61B - 2021 Spring

List

List的实现方式有很多种, 其中包括链表方式实现/数组方式实现

链表方式进行实现List

实现时可以使用一个 sentinel 对一些特殊情况进行平常化

#[derive(Debug)]
pub struct SList {
  size: u64,
  sentinel: Option<Box<IntNode>>
}


#[derive(Debug)]
struct IntNode {
  pub item: i64,
  pub next: Option<Box<IntNode>>
}
/// when not imp this when large tree will lead to stack overflowe
/// 
/// from [stackoverflow](https://stackoverflow.com/questions/28660362/thread-main-has-overflowed-its-stack-when-constructing-a-large-tree)
/// The problem that you are experiencing is because you have a giant linked-list of nodes. 
/// When that list is dropped, the first element tries to free all the members of the struct first. 
/// That means that the second element does the same, and so on, until the end of the list. 
/// This means that you will have a call stack that is proportional to the number of elements in your list!
/// 
impl Drop for IntNode {
  fn drop(&mut self) {
      if let Some(mut child) = self.next.take() {
          while let Some(next) = child.next.take() {
              child = next;
          }
      }
  }
}
impl IntNode {
  fn new(item: i64, next: Option<IntNode>) -> IntNode {
    IntNode{
      item,
      next: if let Some(in_next) = next {
        Some(Box::new(in_next))
      } else {
        None
      }
    }
  }
}

impl SList {

  pub fn new(item_op: Option<i64>) -> SList {
    match item_op {
      Some(item) => SList {
        size: 1,
        sentinel: Some(Box::new(IntNode { item: i64::MAX, next: Some(Box::new(IntNode{ item, next: None })) }))
      },
      None => SList {
        size: 0,
        sentinel: Some(Box::new(IntNode { item: i64::MAX, next: None }))
      }
    }
  }

  pub fn add_first(&mut self, item: i64) {
    self.size += 1;
    if let Some(head) = &mut self.sentinel {
      let next = head.as_mut().next.take();
      head.as_mut().next = Some(Box::new(IntNode {
        item,
        next
      }));
    }
  }

  fn get_first(& self) ->  Option<i64> {
    // self.sentinel.next.as_ref().unwrap().borrow().item;
    if let Some(item) = &self.sentinel.as_ref().unwrap().next {
      return Some(item.item);
    }
    None
  }

  fn get_inner(list: &Option<Box<IntNode>>, index: u64) -> Option<i64> {
    if index == 1 {
      Some(list.as_ref().unwrap().item)
    } else {
      SList::get_inner(&list.as_ref().unwrap().next, index - 1)
    }
  }

  fn get(&self, index: u64) -> Option<i64> {
    SList::get_inner(&self.sentinel.as_ref().unwrap().as_ref().next, index)
  }

  fn add_last(&mut self, item: i64) {
    self.size += 1;
    let mut p = &mut self.sentinel;
 		while let Some(in_next) = p {
      // let a = p.unwrap();
      if in_next.as_ref().next.is_some() {
        p = &mut in_next.as_mut().next;
      } else {
        in_next.as_mut().next = Some(Box::new(IntNode {
          item,
          next: None
        }));
        break;
      }
 		}
  }

  fn size(&self) -> u64 {
    self.size
  }

}

Array的方式实现List

因为 Rust 还不会创动态创建固定长度的数组, 用 Vec又有点脱裤子放屁, 所以就用 Java 实现


public class AList {
  
    private int size;
    private int[] items;
    public AList() {
        size = 0;
        items = new int[100];
    }
    
    public void addLast(int x) {
        if (size >= items.length) {
            int [] new_items = new int[items.length * 2];
            System.arraycopy(items, 0, new_items, 0, items.length);
            items = new_items;
        }
        items[size] = x;
        size += 1;
    }
    
    public int getLast() {
        return items[size - 1];
    }
    
    public int get(int i) {
        return items[i];
    }
    
    public int size() {
        return size;
    }
    
    public int removeLast() {
        size -= 1;
        return items[size];
    }
}

SET

Disjoint Set

假设我们要实现一个这样的接口

public interface DisjointSets {
    void connect(int p, int q);

    boolean isConnected(int p, int q);
}

方案

简单的解决方式

表示连接的方式: 使用一种数据类型记录两个节点之间的关系(线)
检查是否连接: 使用循环的方式进行遍历关联

一种改进方式, 使用集合进行表示关联

对两个节点进行connect时, 将两个节点所在的集合进行合并. 表示现有集合内的所有点都是connect状态
检查是否连接时, 仅遍历所在集合是否同时拥有两个元素.

实现方式

`使用List<Set<Integer>>`

[{0, 1, 2}, {3}, {4}, {5}]
是直观的想法. 在寻找一些节点时, 需要对数组进行遍历.

仅使用集合进行表示

创建一个元素数量大小的 array , 将相同集合内的元素下标置为相同的数字(一样就行).
将两个集合进行合并时, 将所有元素设置成统一的数字即可.

public class QuickFindDS implements DisjointSets{

    private int[] id;

    public QuickFindDS(int N) {
        id = new int[N];
        for (int i = 0; i <N; i++) {
            id[i] = i;
        }
    }
    @Override
    public void connect(int p, int q) {
        int pid = id[p];
        int qid = id[q];
        for (int i = 0; i < id.length; i++) {
            if (id[i] == pid) {
                id[i] = qid;
            }
        }
    }

    @Override
    public boolean isConnected(int p, int q) {
        return id[p] == id[q];
    }
}

使用父子关系来进行表示集合

public class QuickUnionDS implements DisjointSets{

    private int[] parent;

    public QuickUnionDS(int N) {
        parent = new int[N];
        for (int i = 0; i < N; i++) {
            parent[i] = -1;
        }
    }

    public int findRoot(int p) {
        int r = p;
        while (parent[r] >= 0) {
            r = parent[r];
        }
        return r;
    }

    @Override
    public void connect(int p, int q) {
        int i = findRoot(p);
        int j = findRoot(q);
        parent[i] = j;
    }

    @Override
    public boolean isConnected(int p, int q) {
        return findRoot(q) == findRoot(q);
    }
}

此时进行合并两个集合时, 仅需要改变一个数字

当需要对集合进行合并时, 先寻找到各自根. 然后让其中的一个根指向另一个根, 完成关联.

但在最差的情况下, 会导致寻找根时, 链过长

通过树的重量进行判断合并方式

public class WeightedQuickUnion implements DisjointSets {
    private int[] parent;
    private int[] size;
    private int count;

    public WeightedQuickUnion(int n) {
        count = n;
        parent = new int[n];
        size = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;
            size[i] = 1;
        }
    }

    public int count() {
        return count;
    }

    public int find(int p) {
        validate(p);
        while (p != parent[p])
            p = parent[p];
        return p;
    }

    @Deprecated
    public boolean isConnected(int p, int q) {
        return find(p) == find(q);
    }

    private void validate(int p) {
        int n = parent.length;
        if (p < 0 || p >= n) {
            throw new IllegalArgumentException("index " + p + " is not between 0 and " + (n-1));
        }
    }

    public void connect(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        if (rootP == rootQ) return;

        if (size[rootP] < size[rootQ]) {
            parent[rootP] = rootQ;
            size[rootQ] += size[rootP];
        }
        else {
            parent[rootQ] = rootP;
            size[rootP] += size[rootQ];
        }
        count--;
    }
}

树中的元素个数作为数量
总是将小的树合并至大树

为什么重量比较的方式能减少高度的增长

当你使用重量作为合并依据时.
当你需要从高度0增长到高度1, 你需要合并另一个高度为0的树

当你需要从高度1增长至2时, 你需要合并另一个高度为1的树. 如果合并高度为0的树, 合并后的树的高度并不增长.

以此类推

为什么不直接使用高度进行合并

最糟情况下的性能并没有提升都是Θ(log(N))
代码更加复杂

路径压缩(Path Compression)

public class QuickUnionPathCompression implements DisjointSets{

    private int[] id;
    private int count;

    public QuickUnionPathCompression(int n) {
        count = n;
        id = new int[n];
        for (int i = 0; i < n; i++) {
            id[i] = i;
        }
    }

    public int count() {
        return count;
    }

    public int find(int p) {
        int root = p;
        while (root != id[root])
            root = id[root];
        while (p != root) {
            int newp = id[p];
            id[p] = root;
            p = newp;
        }
        return root;
    }

    
    public boolean isConnected(int p, int q) {
        return find(p) == find(q);
    }

    public void connect(int p, int q) {
        int rootP = find(p);
        int rootQ = find(q);
        if (rootP == rootQ) return;
        id[rootP] = rootQ;
        count--;
    }
}

在 WeightedQuickUnion 执行 isConnected 的过程中，改变路径上所有节点的归属，这样不会改变是否在一个集合内，并加快了下次比较的速度。
变更前
变更后

Tree

树的组成

关于节点与边的集合, 但是任意两个点之间只有一条路径

BST(Binary Search Tree)

二叉搜索树是具有 BST 属性的有根二叉树.

BST 属性是指: 节点的左子树都小于节点, 节点的右子树都大于节点

二叉树的操作

插入

static BST insert(BST T, Key ik) {
  if (T == null) return new BST(ik);
  if (ik < T.key) T.left = insert(T.left, ik);
  else if (ik > T.key) T.right = insert(T.right, ik);
  return T;
}

尽量不要在递归代码中额外进行判断特殊情况 ( 指能通过递归本身进行消除的判断, 这种情况称之为 arms-length recursion )
Recursion (computer science))

if (T.left == null)
  T.left = new BST(ik);
else if (T.right == null)
  T.right = new BST(ik);

删除

删除没有child的节点: 直接进行删除
删除有一个child的节点: 将父节点指向删除节点的指针指向child
删除有两个child节点的: 使用Hibbard deletion, 选择 predecessor 和 successor . 总是只有一个 child 或者没有 child.然后对要删除节点进行替换.

BST的深度与平均深度计算

深度的意义: 决定了寻找一个节点的最糟情况
平均深度的意义: 决定了寻找一个节点的平均情况

将 BST 用于 Map 数据类型的存储

只需要在每个节点中同时保存 key/value 就行

B Tree(2-3, 2-3-4 Trees)

当一棵树越接近平衡状态时, 搜索节点的耗时接近logN.所以怎么保持平衡成了关键.
打破平衡大多是添加了新的叶子节点, 如果让叶子能存储更多的数据……

但这样可能会导致叶子存储的数据过多, 所以给叶子限制一个最大值. 当超过最大值时, 随机提升一个到父节点中.

但是又导致了一个新问题, 节点右边的子树,不总是大于节点了(16 > 17), 所以需要增加一个中间的分支

继续存入数据, 20 21.刚好到达限制,但是还没超出

继续加入25 26, 超出限制(长度3)的数据将回提升至parent

同样的, 当root也超出限制时, 会进行分裂.

当根进行分裂时, 所有节点都将降低一个层级
如果是对叶子或者是内部节点进行分裂, 将不会改变层级

B树以一种向上生长的方式保持了完美平衡.
当B树的每层限制为3时, 也可称之为: 2-3-4树/2-4树, 是因为每个节点可能有的自分支数量. 当B树的每层限制为2时, 也可称之为: 2-3树.

The origin of “B-tree” has never been explained by the authors. As we shall see, “balanced,” “broad,” or > “bushy” might apply. Others suggest that the “B” stands for Boeing. Because of his contributions, however, it seems appropriate to think of B-trees as “Bayer”-trees.
-- Douglas Corner (The Ubiquitous B-Tree)

可视化B树

不同L对于树高的影响

在不同L的情况下, B树的高度介于近似于 $Log_{L+1}(N)$ ~ $Log_{2}(N)$ 之间, 所以最糟的情况下, 高度是Θ(log N).每个节点内有L个元素, 所以运行时是O(L log N), 但是因为L是常数. 所以最后是O(log N).

树旋转

假设有一棵树, 有数字123. 通过不同顺序的添加. 将产生不同形态的树.

如果要在不同形态之间转换, 需要使用”旋转“操作.

可以通过”旋转”将一棵树变成平衡树

红黑树(Red-Black Tree)

目前我们已知:

BST: 能通过旋转进行平衡, 但是我们没有已知的算法进行检测
2-3tree: 能够自平衡, 也就是没有旋转的要求

所以红黑树就是使用BST去表现2-3tree
在2-3tree仅有2个cild时, 与二叉树一致. 但在3个cild的情况下. 需要进行转换.

方法一: 使用额外的节点”粘”在一起

方法二: 使用额外的关系”粘”在一起

所以红黑树是BST使用左连接的粘连表示2-3tree的树, 也称之为”Left Leaning Red Black Binary Search Tree” or LLRB.

如何判断合法的红黑树

没有节点有两条红色边
叶子节点到根节点的路径(红边不算)长度相同

红黑树插入规则

当我们加入数据时, 应当使用红线进行连接.
当我们加入一个新的数字, 比原有节点数字大时. 红色的关联不能出现在右边. 所以要对E进行旋转操作

虽然说红色的关联不能出现在右边, 但是在加入过程中能够短暂的违反. 方便进行下一步操作

当连续加入的两个节点都在左边时, 我们需要进行一系列翻转. 成为短暂违反的状态.

然后将短暂违反的状态经过filp操作, 使树重新合法.

同样的, 当filp操作后还是非法的树, 那么此时就要继续旋转

红黑树的运行时

将2-3tree 变为红黑树

总结

Java中的TreeMap
Red-Black Tree

Hashing

使用一个布尔数组来表示一个树是否已经在集合内了

这样算法的运行时就被提升至O(1)
但上面展示是使用数字进行的, 如果想要展示单词要怎么进行处理呢.

首字母的方式会导致冲突, 所以可以使用

这个方法类似于我们十进制表示数据的方式

这样就能保证, 在计算全小写时输出的数值是唯一的

// Implementing englishToInt
/** Converts ith character of String to a letter number.
    * e.g. 'a' -> 1, 'b' -> 2, 'z' -> 26 */
public static int letterNum(String s, int i) {
	int ithChar = s.charAt(i);
	if ((ithChar < 'a') || (ithChar > 'z'))
        { throw new IllegalArgumentException(); }
	return ithChar - 'a' + 1;
}

public static int englishToInt(String s) {
	int intRep = 0;
	for (int i = 0; i < s.length(); i += 1) {       	
        intRep = intRep * 27;
        intRep = intRep + letterNum(s, i);
	}
	return intRep;
}

但我们在处理大写字母的时候就会遇到问题, 所以我们可以扩大访问到127 因为表示英文的文字符号主要是ASCⅡ表, 大小在127

可以使用以下的转换函数

public static int asciiToInt(String s) {
	int intRep = 0;
	for (int i = 0; i < s.length(); i += 1) {       	
        intRep = intRep * 126;
        intRep = intRep + s.charAt(i);
	}
	return intRep;
}

但是如果超出了ASCⅡ的范围要怎么办. 例如中文是使用Unicode.

但是这样计算的结果会超过Java语言的integer的上限(2,147,483,647). 例如

所以我们面临两个挑战

解决冲突

当table在存储数值时, 不再是存储bool值, 而是存储另一个集合, 这个集合中存储的都是相同hash的单词.

所以这就是HashTable的运行方式

HashTable的运行时分析

当HashTable拥有5个桶进行分装内容时, 但因为5是常数, 所以在计算复杂度时,还是Θ(N)

问题出在分装的桶数量, 所以将固定的桶数量改为增长的方式.

增长的策略是, 每当N/M >= 1.5 (不是固定 3, 4 …都行)时, 将M翻倍

这样复杂度就能接近1, 但是扩张M是需要重新分配元素, 所以resize的时候, 复杂度是O(N)

使用HashTable需要注意

Heap

Min-Heap

Tree in Java

创建映射的方式实现树

1a

public class Tree1A<Key> {
   Key k; // e.g. 0
   Tree1A left;
   Tree1A middle;
   Tree1A right;
   ...

1b

public class Tree1B<Key> {
   Key k; // e.g. 0
   Tree1B[] children;
   ...

1c

public class Tree1C<Key> {
   Key k; // e.g. 0
   Tree1C favoredChild;
   Tree1C sibling;
   ...

使用数组的方式实现树

将值和父节点存储在不同的数组中

仅存储值

计算父节点的位置只要 (K -1)/2

也可以将所有的值向后移动一位

总结

Tree and Graph Traversals

树遍历的顺序

先序, 中序

Graph

有向无向

图的术语

有关图的问题

深度优先搜索

S-T链接问题

一个可行的算法:

先进行判断S是否与T相等
如果不相等, 那么对S的邻居V进行逐一判断,V是否和T相等, 相等返回true
如果都不相等, 返回false
但是算法存在一个问题, 需要对遍历过的邻居进行标记

广度优先

图的表示

运行时(有点摸不清)

最短路径

Dijkstra’s Algorithm

将所有的节点放入集合中,并将邻居距离初始化为最大值 , 从起点开始遍历邻居, 根据距离更新集合中邻居的距离. 如果更新距离也要同步更新是从那个节点得到的距离. 并根据更新后的邻居距离, 去除最近距离邻居重复操作. 直至结束.

A* (没听懂好像CS188会有更加细节的东西)

进行规划路线前, 遍历所有节点. 将可能与结果最接近的距离进行记录. 方便后续遍历计算距离时参考.
如下图, PQ中的最短距离变成了d(source, v) + h(v, goal)

h函数是根据经验来的

下图链接

Spanning Trees

MST(Minimum Spanning Trees)不一定是SPT(Shortest-path tree)

Cut Property

如果添加了e使MST形成循环, 那么必有一条边f也跨越了边界, 但是增加e又能降低权重. 所以我们可以认为, 最小生成树必须有e.

Prim’s Algorithm

Demo
从任意的节点开始, 然后从关联的边中选择最小边.

并上节点后, 从扩大的可选边中继续选择, 直至结束.

…

实现

实现有点像Dijkstra’s

只不过距离的记录并不是相对于起点, 而是变成了整棵树.PQ中存储的最小值也变成了距离树的最小值.每次都将更新所有节点与树的距离, 并选出最近的点进行合并.

Kruskal’s Algorithm

将所有的边按照大小顺序排列, 然后逐一合并.当合并不导致循环时合并点, 否则不进行合并.

QuadTrees

2D平面上的范围搜索, 通过四个分支对该节点四个方位进行定位.

Higher Dimensional Data

k-d tree

是为了更高维度而进行设计的, 以下是使用两个维度进行展示, 每个节点都控制着两片子域.

插入时, 不同维度交替进行比较加入

找到离目标节点(0, 7)最近的一个节点, 从A开始

当遍历完成所有较为接近的分支, 有时也有必要检索下那些. “可能”离得远的分支. 例如最接近节点的 Bad Side.

D节点的, Good Side 可以也看看

但是D的 Bad Side 就无需遍历了, 因为 Good Side 都不能获得更加接近的数值

对于C节点的 Bad Side 也是同理

这样的探索保证了不会错过更好的答案