Java高级主题：深度讨论高并发跳表数据结构ConcurrentSkipListMap的源代码实现（下）

文章说明：因为CSM解析内容较多，因此全文分为“深度讨论高并发跳表数据结构ConcurrentSkipListMap的源代码实现（上）”和“深度讨论高并发跳表数据结构ConcurrentSkipListMap的源代码实现（下）”两篇文章
上篇：CSM数据结构设计原理、doGet、doPut核心方法解析
下篇：doRemove核心方法解析、总结

《gitee 博客文章封面》

remove方法：

删除操作的设计原理

这里先介绍Doug Lea在源代码注释给出算法设计说明：n.helpDelete(b,f)的设计原理

上一篇文章的doGet、doPut方法中都有涉及到遇到被标记”删除“的节点时都会加入n.helpDelete(b,f)的处理逻辑，

* In addition to using deletion markers, the lists also use
* nullness of value fields to indicate deletion, in a style
* similar to typical lazy-deletion schemes.  If a node's value is
* null, then it is considered logically deleted and ignored even
* though it is still reachable. This maintains proper control of
* concurrent replace vs delete operations -- an attempted replace
* must fail if a delete beat it by nulling field, and a delete
* must return the last non-null value held in the field. (Note:
* Null, rather than some special marker, is used for value fields
* here because it just so happens to mesh with the Map API
* requirement that method get returns null if there is no
* mapping, which allows nodes to remain concurrently readable
* even when deleted. Using any other marker value here would be
* messy at best.)
* n是当前要删除的节点，b是n的前驱节点，f是n的后继节点，开始删除前，(b,n,f)的指向关系如下
* Here's the sequence of events for a deletion of node n with
* predecessor b and successor f, initially:
*
*        +------+       +------+      +------+
*   ...  |   b  |------>|   n  |----->|   f  | ...
*        +------+       +------+      +------+
*
* 1. CAS n's value field from non-null to null.
*    From this point on, no public operations encountering
*    the node consider this mapping to exist. However, other
*    ongoing insertions and deletions might still modify
*    n's next pointer.
*
* 1、 将删除节点n的value使用cas设成null，表示此刻起，n节点处于待删除状态，虽然其value为null，对于public相关方法如get等，在执行遇到此类型节点时不会将该节点视为有效节点（也即会被跳过处理），但是如果对于正在执行的插入和删除操作的方法来说，可能也可以去更改“此待删除节点n”的后继节点。
* 2. CAS n's next pointer to point to a new marker node.
*    From this point on, no other nodes can be appended to n.
*    which avoids deletion errors in CAS-based linked lists.
*
*        +------+       +------+      +------+       +------+
*   ...  |   b  |------>|   n  |----->|marker|------>|   f  | ...
*        +------+       +------+      +------+       +------+
* 2、使用cas将n节点的next字段指向一个marker节点后，从此刻起，任何节点都不会被放在n的后继节点位置，也即不可能出现n->n1、n->n2等，只有n->marker
* 3. CAS b's next pointer over both n and its marker.
*    From this point on, no new traversals will encounter n,
*    and it can eventually be GCed.
*        +------+                                    +------+
*   ...  |   b  |----------------------------------->|   f  | ...
*        +------+                                    +------+
* 3、通过cas将b的next指针（越过n节点以及n的后继marker节点）指向f节点，从此刻起，n节点不会被相关操作遍历到，n最终会被GC。

可以看出CSM删除操作方面，借用marker节点来实现，将待删除节点的value设为null值来表示该节点处在删除状态但还未真正删除，这种设计风格就像“惰性删除语义(lazy-deletion schemes)”，先标记删，等某个时机再真正执行之。如果CSM的一个节点的value是null，说明该节点在逻辑上已经被删除。

以下是remove方法的源代码解析

1
2
3

public V remove(Object key) {
    return doRemove(key, null);//注意只需要比较key即可，不需要考虑value相等才删除！
}

具体逻辑由doRemove实现

final V doRemove(Object key, Object value) {
    if (key == null)
        throw new NullPointerException();
    Comparator<? super K> cmp = comparator;
    outer: for (;;) {
    		// 1、和doGet、findNode类似设计，首先找到key的前驱节点
        for (Node<K,V> b = findPredecessor(key, cmp), n = b.next;;) {
        // 
            Object v; int c;
            if (n == null)
                break outer;
          // 经典的(b,n,f)三指针
            Node<K,V> f = n.next;
          // 2、n已不是b的后继节点，也即读n节点前后不一致，则重试
            if (n != b.next)                    // inconsistent read
                break;
          // 3、数据节点n节点被标记为删除状态，那么使用helpDelete把n节点删除，然后重试。
            if ((v = n.value) == null) {        // n is deleted
                n.helpDelete(b, f);
                break;
            }
          // 4、key的前驱节点b被标记删除状态，只能重试，读取新的b
            if (b.value == null || v == n)      // b is deleted
                break;
            //5、 给定的key不在数据链表里面，直接结束
            if ((c = cpr(cmp, key, n.key)) < 0)
                break outer;
          	//6、 给定的key比当前数据节点n还大，那么更新b、n向右继续检索
            if (c > 0) {
                b = n;
                n = f;
                continue;
            }
          	// 以下找到与key相等的n节点
            //7、doRemove的入参value默认是null，因此以下会被跳过，若指定value，则还需判断给定的value和当前找到n.value是否相等
            if (value != null && !value.equals(v))
                break outer;
            //8、执行流运行到这里，说明满足删除n节点的条件也即key=n.key,n就是要删除的目标数据节点，因此将n.value设为null，cas失败则重试，注意这是是标记删除，不是直接把n节点删除。
            if (!n.casValue(v, null))
                break;
            /*9、执行流运行到这里，说明n成功被标记删除状态
          	条件1：将 b → n → f 变成  b → n → marker → f
          	条件2：将 b → n → f 变成 b → f
          	如果条件1CAS失败或者条件2CAS失败，都会调用findNode(key)来删除数据节点n
          	*/
            if (!n.appendMarker(f) || !b.casNext(n, f))
                findNode(key);                  // retry via findNode
            else {
            /*10、不妨假设条件1成立，也即将 b → n → f 变成  b → n → marker → f
              需要n节点上方的索引节点都清除（包括清除索引节点的前后指向关系），恰好findPredecessor就是在索引层干这事。
              */ 
                findPredecessor(key, cmp);      // clean index
                if (head.right == null)
                    tryReduceLevel();
            }
            @SuppressWarnings("unchecked") V vv = (V)v;
            return vv;
        }
    }
    return null;
}

虽然在doGet方法中有给出findPredecessor，它是用在查询场景中，而在本小节中findPredecessor被用来删除n节点对应的上方索引节点场景中，现在结合上方doRmove的源代码理解，将更能掌握其中设计逻辑。

private Node<K,V> findPredecessor(Object key, Comparator<? super K> cmp) {
    if (key == null)
        throw new NullPointerException(); // don't postpone errors
    for (;;) {
      	
        for (Index<K,V> q = head, r = q.right, d;;) {
          	//在当前索引层检索，只要不是来到链表尾部，就继续在该层里面遍历
            if (r != null) {
                Node<K,V> n = r.node;
                K k = n.key;
                // 此处对应的是上方doRemove内部将n.value标记为null的情况
                if (n.value == null) {
                  	/* 既然数据层的n节点被删除，那么n节点对应上方关联的索引节点r也要删除:
                    索引层：将 q → r → r.right 变成  q → r.right 
                    */
                    if (!q.unlink(r))
                        break;           // restart
                  	// 索引节点r成功删除后，将r指向q的新right节点，此时q → r 两个索引节点都处于正常状态，继续下一轮遍历。
                    r = q.right;         // reread r
                    continue;
                }
              	// 继续在该层索引向右检索，直到cpr(cmp, key, k) =0
                if (cpr(cmp, key, k) > 0) {
                    q = r;
                    r = r.right;
                    continue;
                }
            }
          	// 当q位于第1层索引层位置时，此时q.down指向的是idx=null，因此可退出，到此从入口head到出口q.down=null沿途找到的与n对应的每层索引节点idx都被删除掉。可直接退出
            if ((d = q.down) == null) 
                return q.node;
          	// 执行流运行到这里，说明还未到达第1层索引层，以上逻辑完成当前层的idx节点删除，那么继续需要处理n节点对应的更低层的索引节点idx。
            q = d;
            r = d.right;
        }
    }
}

因此在doRemove方法的角度来看， findPredecessor(key, cmp) 实际意义是单纯用于clean index，而不是为key找到前驱节点b这么简单，这里的clean index功能就是Doug Lea提到的“side-effect”，也即下面这句话的含义：

Callers rely on this side-effect of clearing indices to deleted nodes.

doRemove作为调用方，能够从调用findPredecessor过程获得额外收益：清除那些已被标记为“删除状态”的数据节点上方对应的每层索引节点idx。

关于doRemove的图解这里不再给出，想要深入理解的同学务必自行将删除逻辑对应的图做出来，否则将难以理解其过程的动态处理。

tryReduceLevel()方法

在doRemove方法的第10个逻辑中，清理完索引节点后，还需要检查是否需要“降层处理”：

else {
/*10、不妨假设条件1成立，也即将 b → n → f 变成  b → n → marker → f
  需要n节点上方的索引节点都清除（包括清除索引节点的前后指向关系），恰好findPredecessor就是在索引层干这事。
  */ 
    findPredecessor(key, cmp);      // clean index
    if (head.right == null)
        tryReduceLevel();
}

在tryReduceLevel设计，在一个if里面塞进去8个条件，而且只有h.level超过3层才进行“降层”处理，等于3层不对索引层“减层”，以下不妨考察h.level=4的情况

private void tryReduceLevel() {
    HeadIndex<K,V> h = head; 
    HeadIndex<K,V> d;
    HeadIndex<K,V> e;
    if (h.level > 3 &&
        // h
        (d = (HeadIndex<K,V>)h.down) != null &&
        (e = (HeadIndex<K,V>)d.down) != null &&
        e.right == null &&
        d.right == null &&
        h.right == null &&
        casHead(h, d) && // try to set
        h.right != null) // recheck
        casHead(d, h);   // try to backout
}

将具有level=4的HeadIndex结构图：
h（原位于level=4的HeadIndex） → null
↓
d → null
↓
e → null
变成level=3的HeadIndex结构：
h(d) (原位于level=3的d节点作为新的HeadIndex)→ idx
↓
e → idx

关于n.helpDelete(b, f)的解析：

/**
 * Helps out a deletion by appending marker or unlinking from
 * predecessor. This is called during traversals when value
 * field seen to be null.
 * @param b predecessor
 * @param f successor
 */
// b是n的前驱节点，n是待删除节点，f是n的后继节点
void helpDelete(Node<K,V> b, Node<K,V> f) {
 
    /*
     * Rechecking links and then doing only one of the
     * help-out stages per call tends to minimize CAS
     * interference among helping threads.
     */
    // 当前线程前后读取的b、n、f保持不变时，就可以实施cas操作
    if (f == this.next && this == b.next) {
     /* 
     条件1：对于b → n → null这种情况，f=n.next,显然f为null
     条件2：对于b → n → f，且f节点不是marker节点，那么给n添加一个marker节点：
     b → n → marker → f，注意：marker指向f节点指向关系不是因为next指针，而是由marker的value指针指向f完成。
      */
        if (f == null || f.value != f) // not already marked
            casNext(f, new Node<K,V>(f));
       else
        /*否则f节点一定是marker节点，且b、n、f以及marker的指向关系如下：
          b → n → f(marker)→ f.next 变成 b → f.next
          */
             b.casNext(this, f.next);
    }
}

理解marker节点的实际设计意义

解析器设计意义需要基于图解，因为涉及到多线程并发删除的非原子性操作。

无maker节点的删除设计

假设线程的cpu时间片分配顺序如下：

结果是明显的，线程B以为自己成功插入了该节点，但对于读线程C来说，新插入节点newNode不可能被从b开始遍历到。

核心原因在于在第4个cpu时间片时，线程B使用n.casNext(f,new Node<>(k,v,next=f)),只用f节点作为CAS比较并不能感知到链表结构已经发生变化，导致线程B前后读取f是一致，然后casNext成功插入的新节点。

这种删除设计使得插入操作不能正确执行，因此不采用。

有maker节点的删除设计

只要让插入线程能根据n.casNext(cmp,new Node<>(k,v,next=cmp)) 感知cmp作为cas比较的节点已经发生改变，那么插入线程自然会cas失败，然后继续下一轮重试，直到读取的n节点不是删除状态，方可能正确插入节点。

对于图中的3和4的说明，因为cpu时间片执行顺序不确定，注意面临两种情况：

1、3执行CAS成功，4失败，迫使插入线程B的CAS插入失败后重新循环读取新的n节点

2、4执行CAS成功，3失败，迫使删除线程A的添加marker节点失败，只能重新循环读取新的n节点

只需一个marker即可实现写线程和写线程之间是线程安全的竞争。

findFirst方法

需要知道的一点是：数据层链表的HeadIndex的构成如下：

1
2
3

//ConcurrentSkipListMap()构造方法里面initialize()
new HeadIndex<K,V>(new Node<K,V>(null, BASE_HEADER, null),
                                  null, null, 1);

因此对于数据链表层的“头节点”可不是数据节点Node，而是索引头节点HeadIndex，因此构造阶段时：

HeadIndex(BASE_HEADER) → null，也即h.node → null，也即h.node =next=null

/*
 * Specialized variant of findNode to get first valid node.
 * @return first node or null if empty
 */
final Node<K,V> findFirst() {
    for (Node<K,V> b, n;;) {
      	//如上面所解析，如果当前数据层链表仅有一个HeadIndex节点，还没有任何数据节点时，显然只能返回null，这里的b=new Node<K,V>(null, BASE_HEADER, null)
        if ((n = (b = head.node).next) == null)
            return null;
      	// n作为数据节点第一个头节点没有被标记位删除，那么n节点就是所早的FirstNode
        if (n.value != null)
            return n;
      	// 如果发现n已经被标记位删除，那么当前线程就去帮助删除该n节点。
        n.helpDelete(b, n.next);
    }
}

findFirst方法一般被其他方法调用，例如size()方法，通过找到数据层链表的数据头节点，然后即可在当成链表去遍历：

不过可以确定的是，此size方法的返回值不能正确反映当前数据层链表的长度，因为是并发环境，其他删除线程可以删除数据节点节点

public int size() {
    long count = 0;
  	// 通过findFirst返回首个数据节点，然后在此头节点遍历链表去统计即可。
    for (Node<K,V> n = findFirst(); n != null; n = n.next) {
        if (n.getValidValue() != null)
            ++count;
    }
    return (count >= Integer.MAX_VALUE) ? Integer.MAX_VALUE : (int) count;
}

removeFirstEntry类似doRemove，基于(b,n,f)三节点去操作。

只要你掌握CSM的数据结构设计原理以及上面解析的doGet、doPut、doRemove、findPredecessor、findNode的具体实现逻辑，那么CSM其他方法都可以非常快的突破。

CSM的小结内容

1、空间复杂度计算

按照本文给出的图示，不妨假设当前数据层链表节点个数为$n$，根据源代码文件的注释说明，一个新插入的数据节点它被作为上方一层的索引节点的概率为0.5，因此有以下非严格数学归纳过程（考虑理想情况下的分布）

level=1，该层索引层对应的索引节点数量 $\frac{n}{2}$

level=2，该层索引层对应的索引节点数量$\frac{n}{2^2}$

level=3，该层索引层对应的索引节点数量$\frac{n}{2^2}$

…

level=k，该层索引层对应的索引节点数量$\frac{n}{2^k}$

不妨假设第k层就是该跳表的最顶层索引层且假设仅有1个索引节点

那么有$\frac{n}{2^k}=1$ ，可算出k的值为：$k=log_2n$ ，习惯性写法为$k=log(n)$

到这里应该可以联想k的含义类似二叉树的高度h，h的推算也是类似的思路。

那么k层的跳表需要额外存储多少索引节点呢？（这里说的额外是指不计数据层节点）总共k层的索引节点相加如下

$\frac{n}{2}+\frac{n}{2^2}+\frac{n}{2^3}+\frac{n}{2^4}+…+4+2+1$ 共有k项，等比为0.5，求和：$sum=n-1$，也即当成为索引节点的概率取值为0.5时，需要额外存储n-1个索引节点，例如数据层节点16个，那么对应四层索引层的索引节点数为8，4，2 ，1

同理若成为索引节点的概率取值为$\dfrac{1}{4}$时，需要存储$\frac{n-1}{3}$ 个索引节点。例如数据层节点16个，那么对应两层索引层的索引节点数为4，1 。

容易推出：当成为索引节点的概率取值为0.5时，需要额外存储n-1个索引节点，再加数据节点数量n个，也即CSM总共存储2n-1个节点，故空间复杂度为$o(n)$，当前去其他概率值也有类似结论，这里不再给出说明。一般使用概率p=0.5，作为CSM数据节点被选中作为索引节点的概率，如果概率选太小例如$\frac{1}{128}$，那么每次查询都几乎都要下沉到数据层去查找，那么查询时间复杂度就会退化到解决$o(n)$

2、时间复杂度

根据空间复杂度的索引节点：

最好情况：最高层索引节点就是目的节点，搜索次数1

最坏情况：若一个新插入的数据节点它被作为上方一层的索引节点的概率为0.5，那么每一层索引层都需要遍历两次索引节点（最高层仅有一个索引节点就1次，若有两个索引节点就两次），不妨假设现在有跳表结构如下，共4层索引：

1 （level=4）

1，9 （level=3）

1，5，9，13 （level=2）

1，3，5，7，9，11，13，15 （level=1）

1，2，3，4，，，，，，，，15，16 （数据层链表）

现在需要查找16这个节点，从level=4开始，

（1）level=4，只需查找1次：比较1，下移

（2）level=3，只需查找2次：比较1，比较9，从9垂直下移到level=2的索引9节点

（3）level=2，只需查找2次：比较9，比较13，从13垂直下移到level=1的索引13节点

（3）数据层：因为每两个节点就有一个是索引节点，因此在数据层检索比较也是最多比较2两次

由上面空间复杂度计算过程可知，若数据层节点有n个前提下，索引层高度为$k=log(n)$ ,

那么时间复杂度的计算为：$k层*（每层索引搜索次数2）+数据层搜索次数2$，也即$log (n) \times 2 +2$，简化最终可知，跳表的查询复杂度为$log(n)$

同理，插入、删除节点都需要先从最顶层索引层开始检索定位，最后再下移到数据层链表进行插入和删除操作，因此插入、删除的时间复杂度也是和查找get的复杂一致：$log(n)$

关于SkipList的严格数学证明，学有余力的同学可以去翻阅跳表发明者William Pugh原著论文，论文地址
以下是Dung Lea对SkipList性能的简要说明，可以看出其实跟TreeMap性能很接近。

* Indexing uses skip list parameters that maintain good search
* performance while using sparser-than-usual indices: The
* hardwired parameters k=1, p=0.5 (see method doPut) mean
* that about one-quarter of the nodes have indices. Of those that
* do, half have one level, a quarter have two, and so on (see
* Pugh's Skip List Cookbook, sec 3.4).  The expected total space
* requirement for a map is slightly less than for the current
* implementation of java.util.TreeMap.

3、对比ConcurrentHashMap

CSM：ConcurrentSkipListMap，CHM：ConcurrentHashMap

场景方面：

CSM适用于要求key有序存放的场景，CHM适用于key无序存放的场景

写操作性能

高并发写（插入、删除）操作方面，CSM采用CAS无锁+更改节点指向实现写操作，插入不会阻塞删除操作，而CHM采用CAS+Synchronized独占锁方式，对于同一桶位进行插入操作会阻塞删除操作（反向也成立），那么这两种数据结构的插入、删除新能如何呢？

关于插入和删除的占用内存的比较如下：对象占用比较方面，这里使用lucene的一个工具类RamUsageEstimator

<dependency>
    <groupId>org.apache.lucene</groupId>
    <artifactId>lucene-core</artifactId>
    <version>8.9.0</version>
</dependency>

demo代码：

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.ConcurrentSkipListMap;
import org.apache.lucene.util.RamUsageEstimator;


public class SkipListMapPerf {
    static ConcurrentSkipListMap<Integer,Integer> csm=new ConcurrentSkipListMap<>();
    static ConcurrentHashMap<Integer,Integer> chm=new ConcurrentHashMap<>();
    public static void main(String[] args) throws InterruptedException {
        int[] scales={1,4,16,64,128};

        int loops=1000*1000;
        for (int scale : scales) {
            System.out.println("线程数："+scale);
            System.out.println("put:");
            csmPut(scale,loops);
            chmPut(scale,loops);
            System.out.println("ramUsage:");
            System.out.println("csm:"+RamUsageEstimator.humanReadableUnits(RamUsageEstimator.sizeOfMap(csm)));
            System.out.println("chm:"+RamUsageEstimator.humanReadableUnits(RamUsageEstimator.sizeOfMap(chm)));
            System.out.println("remove:");
            csmRemove(scale,loops);
            chmRemove(scale,loops);
            System.out.println("ramUsage:");
            System.out.println("csm"+RamUsageEstimator.humanReadableUnits(RamUsageEstimator.sizeOfMap(csm)));
            System.out.println("chm"+RamUsageEstimator.humanReadableUnits(RamUsageEstimator.sizeOfMap(chm)));

        }

    }


    public static void csmPut(int threadNums,int loops) throws InterruptedException {
        List<Thread> threadList=new ArrayList<>();
        long start= System.currentTimeMillis();
        for (int i = 0; i < threadNums; i++) {
            Thread t= new Thread(() -> {
                for (int loop = 0; loop < loops; loop++) {
                    csm.put(loop,loop);
                }
            });
            threadList.add(t);
        }
        for (Thread thread : threadList) {
            thread.start();
        }

        for (Thread thread : threadList) {
            thread.join();
        }

        long duration=System.currentTimeMillis()-start;
        String s= String.format("csmPut：线程数为%s,计算%s次,总用时%sms",threadNums,loops,duration);
        System.out.println(s);
    }


    public static void chmPut(int threadNums, int loops) throws InterruptedException {
        List<Thread> threadList=new ArrayList<>();
        long start= System.currentTimeMillis();
        for (int i = 0; i < threadNums; i++) {
            Thread t= new Thread(() -> {
                for (int loop = 0; loop < loops; loop++) {
                    chm.put(loop,loop);
                }
            });
            threadList.add(t);
        }
        for (Thread thread : threadList) {
            thread.start();
        }

        for (Thread thread : threadList) {
            thread.join();
        }

        long duration=System.currentTimeMillis()-start;
        String s= String.format("chmPut：线程数为%s,计算%s次,总用时%sms",threadNums,loops,duration);
        System.out.println(s);
    }


    public static void csmRemove(int threadNums, int loops) throws InterruptedException {
        List<Thread> threadList=new ArrayList<>();
        long start= System.currentTimeMillis();
        for (int i = 0; i < threadNums; i++) {
            Thread t= new Thread(() -> {
                for (int loop = 0; loop < loops; loop++) {
                    csm.remove(loop);
                }
            });
            threadList.add(t);
        }
        for (Thread thread : threadList) {
            thread.start();
        }

        for (Thread thread : threadList) {
            thread.join();
        }

        long duration=System.currentTimeMillis()-start;
        String s= String.format("csmRemove：线程数为%s,计算%s次,总用时%sms",threadNums,loops,duration);
        System.out.println(s);
    }


    public static void chmRemove(int threadNums, int loops) throws InterruptedException {
        List<Thread> threadList=new ArrayList<>();
        long start= System.currentTimeMillis();
        for (int i = 0; i < threadNums; i++) {
            Thread t= new Thread(() -> {
                for (int loop = 0; loop < loops; loop++) {
                    chm.remove(loop);
                }
            });
            threadList.add(t);
        }
        for (Thread thread : threadList) {
            thread.start();
        }

        for (Thread thread : threadList) {
            thread.join();
        }

        long duration=System.currentTimeMillis()-start;
        String s= String.format("chmRemove：线程数为%s,计算%s次,总用时%sms",threadNums,loops,duration);
        System.out.println(s);
    }



}

输出结果：

线程数：1
put:
csmPut：线程数为1,计算1000000次,总用时2285ms
chmPut：线程数为1,计算1000000次,总用时1446ms
ramUsage:
csm:511.2 MB
chm:511.2 MB
remove:
csmRemove：线程数为1,计算1000000次,总用时217ms
chmRemove：线程数为1,计算1000000次,总用时111ms
ramUsage:
csm48 bytes
chm64 bytes
线程数：4
put:
csmPut：线程数为4,计算1000000次,总用时676ms
chmPut：线程数为4,计算1000000次,总用时520ms
ramUsage:
csm:511.2 MB
chm:511.2 MB
remove:
csmRemove：线程数为4,计算1000000次,总用时522ms
chmRemove：线程数为4,计算1000000次,总用时296ms
ramUsage:
csm48 bytes
chm64 bytes
线程数：16
put:
csmPut：线程数为16,计算1000000次,总用时1873ms
chmPut：线程数为16,计算1000000次,总用时1687ms
ramUsage:
csm:511.2 MB
chm:511.2 MB
remove:
csmRemove：线程数为16,计算1000000次,总用时789ms
chmRemove：线程数为16,计算1000000次,总用时234ms
ramUsage:
csm48 bytes
chm64 bytes
线程数：64
put:
csmPut：线程数为64,计算1000000次,总用时5564ms
chmPut：线程数为64,计算1000000次,总用时2879ms
ramUsage:
csm:511.2 MB
chm:511.2 MB
remove:
csmRemove：线程数为64,计算1000000次,总用时2505ms
chmRemove：线程数为64,计算1000000次,总用时321ms
ramUsage:
csm48 bytes
chm64 bytes
线程数：128
put:
csmPut：线程数为128,计算1000000次,总用时9665ms
chmPut：线程数为128,计算1000000次,总用时4396ms
ramUsage:
csm:511.2 MB
chm:511.2 MB
remove:
csmRemove：线程数为128,计算1000000次,总用时4548ms
chmRemove：线程数为128,计算1000000次,总用时880ms
ramUsage:
csm48 bytes
chm64 bytes

Process finished with exit code 0

从输出结果可以得到以下推论：

1、插入1百万个节点后，CSM和CHM在jvm的大小都很接近

2、在高并发插入场景下，CHM的性能比CSM快两倍左右，解释：

CSM插入大量节点时，每插入一个数据节点都判断是否要在此数据节点上方建立多层索引节点，若需要添加索引节点，那么创建索引节点后将索引节点链入该层索引层中。而CHM插入大量节点时，如果key的hash足够分散，那么大量节点其实是分布在底层的table桶位上，或者在部分桶位形成冲突链链表，在同一桶位上需要升级为红黑树结构的概率小，因此CHM通过不断扩容底层table让节点直接插入在桶位上，显然比CSM的插入逻辑要快很多。

3、在高并发删除方面，CHM的性能比CSM快5到6倍左右。

考察CSM并发删除，虽然它是基于CAS+marker的无锁方式去删除节点，但CSM删除一个节点前，首先需要去多层索引去检索该key的前驱节点，然后再删除该节点以及它对应的索引节点（调整所在层的前后指向），还得判断是否需要“降层处理”，而CHM方面，由于key是数值型，百万的key分布在桶位上相对均匀，因此对于底层table的桶位节点删除操作时非常快速的，少量桶位会涉及到冲突链表删除、红黑树转为链表或者红黑树删除平衡调整等但这些情况不多，因此整体上CHM的删除效率高很多。

4、查找效率方面对比

不用对比也知道，CHM因为是hashMap的$o(1)$查找效率，显然比CSM的$log(n)$要快，如下面数据对比：

csmGet：线程数为64,计算1000000次,总用时3949ms
chmGet：线程数为64,计算1000000次,总用时572ms

csmGet：线程数为128,计算1000000次,总用时8276ms
chmGet：线程数为128,计算1000000次,总用时1150ms

4、CSM初始化时在new阶段还是在put阶段完成？

CSM是在new阶段完成初始化：initialize()，而CHM是在第一次put才会初始化：tab = initTable()

首先在new阶段已经完成初始化，CSM初始化逻辑如下：

private void initialize() {
    keySet = null;
    entrySet = null;
    values = null;
    descendingMap = null;
    head = new HeadIndex<K,V>(new Node<K,V>(null, BASE_HEADER, null),
                              null, null, 1);
}

可以看到这个new HeadIndex 是位于level=1的索引层，而这个new Node<K,V>(null, BASE_HEADER, null) 是位于最底层数据层链表，对应的图示如下：

5、CSM最高有多少层索引

根据size方法，CSM的数据层链表最多可以链接个节点Integer.MAX_VALUE个节点，也即$2^{31}-1$

public int size() {
    long count = 0;
    for (Node<K,V> n = findFirst(); n != null; n = n.next) {
        if (n.getValidValue() != null)
            ++count;
    }
    return (count >= Integer.MAX_VALUE) ? Integer.MAX_VALUE : (int) count;
}

根据前面的上面1提到的空间复杂度计算，p=0.5，也即新插入的数据节点上升为索引节点的概率，那么最大索引层数为$k= log_2(2^{31}-1)$，计算结果为k=31，注意这个31层是指最大索引层数量（第31层有2个索引节点），不包括最底层的数据链表这一层！

那么CSM的节点数量和底层数组最大是多少呢？

节点数量最大值也是$2^{31}-1$

public int size() {
    long n = sumCount();
    return ((n < 0L) ? 0 :
            (n > (long)Integer.MAX_VALUE) ? Integer.MAX_VALUE :
            (int)n);
}

底层数组最大为$2^{30}$


  /**
   * The largest possible table capacity.  This value must be
   * exactly 1<<30 to stay within Java array allocation and indexing
   * bounds for power of two table sizes, and is further required
   * because the top two bits of 32bit hash fields are used for
   * control purposes.
   */
// top two bits of 32bit hash fields are used for control purposes ，第31和32位被用于相关“控制”，因此底层table的桶位数量最大值为2^30次方
  private static final int MAXIMUM_CAPACITY = 1 << 30;