Hashing

Hashing

1. Introduction

     Hashing is a technique used for performing insertions, deletions, and finds in constant average time.

Hash function

     Each key is mapped into some number in the range 0  to TableSize -1 and placed in the appropriate cell. And this mapping is called a hash function since there are a finite number of cells and a virtually inexhaustible supply of keys, this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells.

     The only remaining problems :

  • deal with choosing a function
  • deciding what to do when two keys hash to the same value

2. Hash Function

  if the input keys are integers, then simply returning key mod TableSize is generally a reasonable strategy, unless Key happens to have some undesirable properties. In this case, the choice of hash function needs to be carefully considered. For instance, if the table size the is 10 and the keys all end in zero, then the standard hash function is a bad choice. For reasons we shall see later, and to avoid situations like the one above, it is often a good idea to ensure that the table size is prime.

  Usually, the Keys are strings; in this case, the hash function needs to be chosen carefully. The best way use Horner’s rule. If the keys are very long, the hash function will take too long to compute. A common practice in this case is not to use all the characters. Some programmers implement their hash function by using only the characters in the odd spaces, with the idea that the time saved computing the hash function will make up for a slightly less evenly distributed function.

       The main programming detail left is collision resolution. If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. There are several methods for dealing with this. We will discuss tow of the simplest:  separate chaining and open addressing.

 3. Separate Chaining

  The first strategy, commonly known as separate chaining, is to keep a list of all elements that hash to the same value.

  • To perform a search, we use the hash function to determine which list to traverse. we then search the appropriate list.
  • To perform a insert, we check the appropriate list to see whether the element is already in place. If the element turns out to be new, it can be inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future.

4. Hash Tables without Linked Lists

  Separate chaining hashing has the disadvantage of using linked lists. This could slow the algorithm down a bit because of the time required to allocate new cells, and also essentially requires the implementation of a second data structure.

  There are three common collision resolution strategies alternatively:

  • Linear Probing
  • Quadratic Probing
  • Double Hashing

5. Rehashing

  If the table gets too full, the running time for the operations will start taking too long and insertions might fail for open addressing hashing with quadratic resolution. This can happen if there are too many removals intermixed with insertions. A solution, then, is to build another talbe that is about twice as big and scan down the entire original hash table, computing the new hash value for each element and inserting it in the new table.

  This entire operation is called rehashing.

Rehashing can be implemented in several ways with quadratic probing:

  • rehash as soon as the table is half full
  • rehash only when an insertion fails
  • rehash when the table reaches a certain load factor, Since performance does degrade as the load factor increase, the third strategy, implemented with a good cutoff, could be best.

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/110273.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 安卓相册设置_安卓11原生相机

    安卓相册设置_安卓11原生相机前言适配前台程序员必不可少的工作之一,且可能要花大量的时间精力。何为前台程序员,是面向用户的一端,包括前端、移动端、PC等等。何为适配,适配就是当我们的开发环境、运行环境等发生变化的时候,程序依然能稳健运行。而适配中最难为程序员的就是Android了,除了开发环境、运行环境等因素之外,因为Android开源的原因,还要适配各大厂商。。而适配条件之多,经常让Android程序员为之头疼。来看看相机、相册相关的适配历程:Android6权限适配Android7文件适配Android

    2022年9月22日
    0
  • eclipse中Git的使用

    eclipse中Git的使用eclipse中Git的使用

    2022年4月23日
    47
  • HDU3572_Task Schedule(网络流最大流)[通俗易懂]

    HDU3572_Task Schedule(网络流最大流)

    2022年1月24日
    32
  • iOS的三重缓冲和微型口吃(micro stuttering)

    iOS的三重缓冲和微型口吃(micro stuttering)Instrument中的Display模块:instrument-displayiOS中采用双重缓冲和三重缓冲一起使用,从display中就可以看出来。即:双缓冲不够用了就采用三缓冲。首先看看双重缓冲:双缓冲如上,此时双缓冲很够用,每次Vsync来到之前,上一帧的framebuffer(apple叫做surface+ID),所以帧率很高,基本在…

    2022年5月11日
    43
  • python怎么表示取余_python如何实现取余操作[通俗易懂]

    python怎么表示取余_python如何实现取余操作[通俗易懂]python实现取余操作的方法:可以利用求模运算符(%)来实现。求模运算符可以将两个数相除得到其余数。我们还可以使用divmod()函数来实现取余操作,具体方法如:【divmod(10,3)】。在python中要实现取余操作可以使用求模运算符(%),该运算符可以将两个数相除得到其余数。(推荐教程:Python入门教程)如果一个数恰好能被另外一个数据整除,则余数为0,%运算后返回结果为0。可利用余数…

    2022年4月25日
    75
  • SSH简介及两种远程登录的方法「建议收藏」

    SSH简介及两种远程登录的方法「建议收藏」目录SSH的安全机制SSH的安装启动服务器的SSH服务SSH两种级别的远程登录SSH的高级应用SecureShell(SSH)是由IETF(TheInternetEngineeringTaskForce)制定的建立在应用层基础上的安全网络协议。它是专为远程登录会话(甚至可以用Windows远程登录Linux服务器进行文件互传)和其他网络服务提供安全性的协议,…

    2022年6月20日
    28

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号