ceph scrub实现机制

0. scrub pg注册及触发

在这里插入图片描述

pg注册完之后由osd tick线程根据时间和系统压力决定是否启动scrub
pg如果满足上述条件则会加入scrub_wq队列，唤醒相应的工作线程进行处理
scrub处理过程是由disk_tp线程从scrub_wq队列取PG进行的，一般的主要过程为

1. scrub_wq队列中PG所处的状态有如下：

INACTIVE: new round pg scrub
NEW_CHUNK：scrub range has blocked object
WAIT_PUSHES: wait for pushes to apply
WAIT_LAST_UPDATE: wait for writes to flush
WAIT_REPLICAS: wait for replicas to build scrub map
WAIT_DIGEST_UPDATES: waiting on digest updates

2. scrub状态机

在这里插入图片描述

3. scrub参数

osd_scrub_chunk_min //参与scrub最少的对象数
osd_scrub_chunk_max //参与scrub最大的对象数
osd_max_scrubs //参数scrub最大的pg数
osd_deep_scrub_update_digest_min_age //digest生命周期2天，如果write操作非整对象的即offset=0,length=oi.size则是会调用clear_data_digest清理digest的标记，因此在做deep scrub时候会重新计算data digest

4. scrub对象遍历即BUILD_MAP过程

4.1 Collection

Collection 对应的是OSD里一个目录，一个PG对应一个Collection
Collection 存储的是数据，包括元数据、对象数据、以及临时候对象数据

4.2 CollectIndex提供了一系列针对Collection的API操作接口

对象查找
对象创建
对象删除
分裂目录
根据某种需求列出对象等

4.3 HashIndex是CollectionIndex的一个实现，使用的是对象的HASH值来做对象存储目录，如下图

对象存放的目录
对象存放的目录层级
- 目录属性
- subdir_info_s
```
 struct subdir_info_s { uint64_t objs; ///< Objects in subdir. uint32_t subdirs; ///< Subdirs in subdir. uint32_t hash_level; ///< Hashlevel of subdir. .... } 
```
- 属性内容

4.4 CollectionIndex与LFNIndex、HashIndex之间的关系

在这里插入图片描述

4.5 FileStore IndexManager

class IndexManager { Mutex lock; ///< Lock for Index Manager bool upgrade; sds::unordered_map 
  
    col_indices; .... int build_index(coll_t c, const char *path, CollectionIndex index); int get_index(coll_t c, const string& baseDir, Index *index); int init_index(coll_t c, const char *path, uint32_t filestore_version); ....

IndexManager提供的api接口包括：

build_index
get_index
init_index

4.6 FileStore IndexManager构建

 static const uint32_t FLAT_INDEX_TAG = 0; static const uint32_t HASH_INDEX_TAG = 1; static const uint32_t HASH_INDEX_TAG_2 = 2; static const uint32_t HOBJECT_WITH_POOL = 3;

collection_version

4.7 hobject_t ghobject_t

struct hobject_t { object_t oid; snapid_t snap; private: uint32_t hash; bool max; filestore_hobject_key_t filestore_key_cache; static const int64_t POOL_IS_TEMP .... } struct ghobject_t { hobject_t hobj; gen_t generation; shard_id_t shard_id; .... }

ghobject 在对象hobject_t的基础上，添加了generation字段和shard_id 字段,这个主要用于ErasureCode用于rollback用的。如果是Replicate，那么shard_id字段就设置为NO_SHARD(-1)，这两个字段对于replicate是没有用的。

当PG为EC时，两种操作需要区分写前后两个版本的object.

对EC条带进行更新写
rollback
如果是上述两种的操作需要保存对象的上一个版本（generation）的对象，当EC写失败时，可以恢复到上一个版本
如果是append操作，失败了只需要调用ftruncate就行空间回收即可无需进行对象保存

4.8 rados对象的名字：

name+[“head”|”snapdir”|snap_id] + 下划线 + hash(十六进制）+ 下划线 +pool_id + [下划线 + generaion + 下划线 + shard_id]
例如：

rbd 对象
EC 对象

4.9 对象遍历查找过程 HashIndex::list_by_hash

在这里插入图片描述
遍历采用深度优先遍历算法返回object列表，包含三种情况

当遍历的subdir下的object的数量大于max_count则返回max_count个对象
当遍历的subdir下的object的数量小于min_count则需要遍历下一个subdir，最多返回max_count个对象，最少返回min_count+1个对象
当遍历完所有的subdir下object数小于min_count则返回全部的对象

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/208591.html原文链接：https://javaforall.net

ceph scrub实现机制

0. scrub pg注册及触发

1. scrub_wq队列中PG所处的状态有如下：

2. scrub状态机

3. scrub参数

4. scrub对象遍历即BUILD_MAP过程

4.1 Collection

4.2 CollectIndex提供了一系列针对Collection的API操作接口

4.3 HashIndex是CollectionIndex的一个实现，使用的是对象的HASH值来做对象存储目录，如下图

4.4 CollectionIndex与LFNIndex、HashIndex之间的关系

4.5 FileStore IndexManager

4.6 FileStore IndexManager构建

4.7 hobject_t ghobject_t

4.8 rados对象的名字：

4.9 对象遍历查找过程 HashIndex::list_by_hash

关于作者

全栈程序员-站长

发表回复

ceph scrub实现机制

0. scrub pg注册及触发

1. scrub_wq队列中PG所处的状态有如下：

2. scrub状态机

3. scrub参数

4. scrub对象遍历即BUILD_MAP过程

4.1 Collection

4.2 CollectIndex提供了一系列针对Collection的API操作接口

4.3 HashIndex是CollectionIndex的一个实现，使用的是对象的HASH值来做对象存储目录，如下图

4.4 CollectionIndex与LFNIndex、HashIndex之间的关系

4.5 FileStore IndexManager

4.6 FileStore IndexManager构建

4.7 hobject_t ghobject_t

4.8 rados对象的名字：

4.9 对象遍历查找过程 HashIndex::list_by_hash

关于作者

全栈程序员-站长

相关推荐

腾讯加码空间智能大模型，这一赛道正在成为下一个风口

DXVA硬件加速解码

请求头header里的contentType为application/json和capplition/x-www-form-urlencoded「建议收藏」

java 析构方法_java析构方法详解

Oracle数据块原理深入剖析

hybrid开发模式

发表回复