Hashing

Hashing

1. Introduction

     Hashing is a technique used for performing insertions, deletions, and finds in constant average time.

Hash function

     Each key is mapped into some number in the range 0  to TableSize -1 and placed in the appropriate cell. And this mapping is called a hash function since there are a finite number of cells and a virtually inexhaustible supply of keys, this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells.

     The only remaining problems :

  • deal with choosing a function
  • deciding what to do when two keys hash to the same value

2. Hash Function

  if the input keys are integers, then simply returning key mod TableSize is generally a reasonable strategy, unless Key happens to have some undesirable properties. In this case, the choice of hash function needs to be carefully considered. For instance, if the table size the is 10 and the keys all end in zero, then the standard hash function is a bad choice. For reasons we shall see later, and to avoid situations like the one above, it is often a good idea to ensure that the table size is prime.

  Usually, the Keys are strings; in this case, the hash function needs to be chosen carefully. The best way use Horner’s rule. If the keys are very long, the hash function will take too long to compute. A common practice in this case is not to use all the characters. Some programmers implement their hash function by using only the characters in the odd spaces, with the idea that the time saved computing the hash function will make up for a slightly less evenly distributed function.

       The main programming detail left is collision resolution. If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. There are several methods for dealing with this. We will discuss tow of the simplest:  separate chaining and open addressing.

 3. Separate Chaining

  The first strategy, commonly known as separate chaining, is to keep a list of all elements that hash to the same value.

  • To perform a search, we use the hash function to determine which list to traverse. we then search the appropriate list.
  • To perform a insert, we check the appropriate list to see whether the element is already in place. If the element turns out to be new, it can be inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future.

4. Hash Tables without Linked Lists

  Separate chaining hashing has the disadvantage of using linked lists. This could slow the algorithm down a bit because of the time required to allocate new cells, and also essentially requires the implementation of a second data structure.

  There are three common collision resolution strategies alternatively:

  • Linear Probing
  • Quadratic Probing
  • Double Hashing

5. Rehashing

  If the table gets too full, the running time for the operations will start taking too long and insertions might fail for open addressing hashing with quadratic resolution. This can happen if there are too many removals intermixed with insertions. A solution, then, is to build another talbe that is about twice as big and scan down the entire original hash table, computing the new hash value for each element and inserting it in the new table.

  This entire operation is called rehashing.

Rehashing can be implemented in several ways with quadratic probing:

  • rehash as soon as the table is half full
  • rehash only when an insertion fails
  • rehash when the table reaches a certain load factor, Since performance does degrade as the load factor increase, the third strategy, implemented with a good cutoff, could be best.

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/110273.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • pycharm的git_pycharm版本控制

    pycharm的git_pycharm版本控制1、createpatchcreatepatch打补丁,当连接不上git服务器时,可以先本地打补丁,生成一个文件,里面记录了文件变更信息,后面可以随时提交至git服务器2、checkoutrevisioncheckoutrevision检出版本,可以回退到任意版本,右边会显示当前检出版本与上一版本的变化3、newbranchnewbranch建立新的分…

    2022年8月29日
    3
  • Winform控件开发(1)——Label(史上最全)

    Winform控件开发(1)——Label(史上最全)作用:一般用于显示文本或者作为”按钮使用”,当作为显示文本使用时,通过设置label的Text属性实现,当作为“按钮使用时”,在lable的单击事件下注册事件即可,下面详细介绍label的属性:1、Name属性,该属性代表label类对象的名称,通过该属性可以获取到该label对象,如下图:该label对象名称为label1,当然也可以更改为其他名称2、AllowDrop属性,该属性的值是…

    2022年7月26日
    33
  • linux sendmail发送邮件_shell上传文件到服务器

    linux sendmail发送邮件_shell上传文件到服务器Sendmail是目前Linux系统下面用得最广的邮件系统之一,虽然它存在一些不足,不过,目前还是有不少公司在使用它。对它的学习,也能让我们更深的了解邮件系统的运作。下面我们就来看看sendmail邮

    2022年8月4日
    7
  • 5线上模式刷2亿bug_GTA5还想冲销量?玩家利用BUG刷钱,遭受比封号更严厉惩罚

    5线上模式刷2亿bug_GTA5还想冲销量?玩家利用BUG刷钱,遭受比封号更严厉惩罚  《GTA5》作为一款神级开放世界游戏,即便已经发售了七年,凭借其优秀的品质以及耐玩性,而今仍旧是许多玩家讨论的焦点。不过在近期,鲜有出现负面消息的R星却因为线上模式的BUG而受到了玩家们的非议。  《GTA5》说是游戏,但你在玩的过程中不自觉就代入到了主角的视角中。跟现实世界一样,美金在游戏中也非常重要。有了钱,你就可以在洛圣都这个虚拟世界中过自己想要的生活。  但玩家在游戏中无论是使用合法的…

    2022年4月26日
    347
  • 几行代码搞定画廊效果

    几行代码搞定画廊效果曲木为直终必弯,养狼当犬看家难。墨染鸬鹚黑不久,粉刷乌鸦白不坚。蜜饯黄莲终需苦,强摘瓜果不能甜。好事总得善人做,哪有凡人做神仙。当!废话不多言,上回书说道,我最近寻思干点嘛,却又无所事事,天天水群,于是心不安理不得,这天忽然看到一个画廊效果,虽然已是过时产物,但是本着劳资不会,就是比比的崇高目标,结果遭人鄙视,无人同情,令人叹惋。于是乎,奋笔疾书,瞎(说鸡不说吧,文明你我他)写,终于

    2022年5月5日
    43
  • laravel 5以后数据库插入自动转化方式

    laravel 5以后数据库插入自动转化方式

    2021年11月10日
    44

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号