Hashing

Hashing

1. Introduction

     Hashing is a technique used for performing insertions, deletions, and finds in constant average time.

Hash function

     Each key is mapped into some number in the range 0  to TableSize -1 and placed in the appropriate cell. And this mapping is called a hash function since there are a finite number of cells and a virtually inexhaustible supply of keys, this is clearly impossible, and thus we seek a hash function that distributes the keys evenly among the cells.

     The only remaining problems :

  • deal with choosing a function
  • deciding what to do when two keys hash to the same value

2. Hash Function

  if the input keys are integers, then simply returning key mod TableSize is generally a reasonable strategy, unless Key happens to have some undesirable properties. In this case, the choice of hash function needs to be carefully considered. For instance, if the table size the is 10 and the keys all end in zero, then the standard hash function is a bad choice. For reasons we shall see later, and to avoid situations like the one above, it is often a good idea to ensure that the table size is prime.

  Usually, the Keys are strings; in this case, the hash function needs to be chosen carefully. The best way use Horner’s rule. If the keys are very long, the hash function will take too long to compute. A common practice in this case is not to use all the characters. Some programmers implement their hash function by using only the characters in the odd spaces, with the idea that the time saved computing the hash function will make up for a slightly less evenly distributed function.

       The main programming detail left is collision resolution. If, when an element is inserted, it hashes to the same value as an already inserted element, then we have a collision and need to resolve it. There are several methods for dealing with this. We will discuss tow of the simplest:  separate chaining and open addressing.

 3. Separate Chaining

  The first strategy, commonly known as separate chaining, is to keep a list of all elements that hash to the same value.

  • To perform a search, we use the hash function to determine which list to traverse. we then search the appropriate list.
  • To perform a insert, we check the appropriate list to see whether the element is already in place. If the element turns out to be new, it can be inserted at the front of the list, since it is convenient and also because frequently it happens that recently inserted elements are the most likely to be accessed in the near future.

4. Hash Tables without Linked Lists

  Separate chaining hashing has the disadvantage of using linked lists. This could slow the algorithm down a bit because of the time required to allocate new cells, and also essentially requires the implementation of a second data structure.

  There are three common collision resolution strategies alternatively:

  • Linear Probing
  • Quadratic Probing
  • Double Hashing

5. Rehashing

  If the table gets too full, the running time for the operations will start taking too long and insertions might fail for open addressing hashing with quadratic resolution. This can happen if there are too many removals intermixed with insertions. A solution, then, is to build another talbe that is about twice as big and scan down the entire original hash table, computing the new hash value for each element and inserting it in the new table.

  This entire operation is called rehashing.

Rehashing can be implemented in several ways with quadratic probing:

  • rehash as soon as the table is half full
  • rehash only when an insertion fails
  • rehash when the table reaches a certain load factor, Since performance does degrade as the load factor increase, the third strategy, implemented with a good cutoff, could be best.

 

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/110273.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 关于radcontrols控件之Radupload「建议收藏」

    关于radcontrols控件之Radupload「建议收藏」Namespace:Telerik.Windows.ControlsAssembly:Telerik.Windows.Controls.Input(inTelerik.Windows.Controls.Input.dll)RadUpload是客户端和服务器端的一部分。在客户端执行完全在浏览器中使用Silverlight的平台。在服务器端需要处理的服务器进行处理的文件提交到客户端。检查在…

    2022年7月24日
    6
  • Jmeter面试题_软件测试的面试题及答案

    Jmeter面试题_软件测试的面试题及答案最近有个学生反馈,自己在面试的时候,遇到一个jmeter题目,要我帮忙看下,题目如下:进入http://www.weather.com.cn/网站,用jmeter编写脚本实现如下操作(下列要求在同一个测试脚本):(1)编写获取北京天气紫外线、穿衣、洗车、感冒指数的压测脚本,要求将城市参数化10个(城市名字自定义),将城市的当前实时天气>10度作为断言,并将天气数字输出打印到日志,设置2…

    2022年9月30日
    2
  • html5 a标签去下划线,css中如何去掉a标签的下划线?[通俗易懂]

    html5 a标签去下划线,css中如何去掉a标签的下划线?[通俗易懂]我们在HTML网页制作过程中,相信大家对css文本超链接这个概念并不陌生。我们都知道想要给某段文本或者指定元素添加一个锚点也就是超链接需要用到HTML中的a标签。那么有的新手可能就会发现,在使用a标签时文本超链接会自动出现下划线!从视觉美观上来说枯燥单调的文本超链接显示显然并不好看。那么该如何去掉a标签的下划线呢?下面我们来看一下css去掉a标签下划线的方法。本篇文章就给大家详细讲讲怎么去掉css…

    2022年5月2日
    61
  • TCP拥塞控制的实现

    TCP拥塞控制的实现本文只是对TCP协议做个简要的介绍。TCP协议,即传输控制协议,与UDP协议同处于传输层,同样使用相同的网络层,但TCP提供了一种可靠的、面向连接的数据传输服务,它会在两个使用TCP的应用之间建立一个TCP连接,在该连接上进行数据的传输。TCP通过以下方式提供可靠性:1、应用程序被分割成TCP认为最合适发送的数据块。这点与UDP完全不同,应用程序产生的UDP数据报长度将保持不变,加上IP首部后,才会

    2022年6月24日
    26
  • Python之queue模块

    queue模块实现了多生产者,多消费者的队列。当要求信息必须在多线程间安全交换,这个模块在同步线程编程时非常有用,Queue模块实现了所有要求的锁机制。内部实现是在抢占式线程加上临时锁,但是没有涉

    2021年12月30日
    42
  • MySQL数据库(基础)

    MySQL数据库(基础)目录1.数据库概念1.1数据库是干嘛的?1.2数据库和数据结构是啥关系?​1.3两种类型的数据库2.MySQL数据库2.1MySQL数据库概念2.2MySQL基本操作2.2.1建立数据库2.2.2查看数据库2.2.3选中数据库2.2.4删除数据库2.3MySQL数据类型1.数据库概念1.1数据库是干嘛的?数据库的功能就是用来组织数据,组织很多很多的数据。这些数据通常都是存储在外存(磁盘)数据库提供的核…

    2022年7月24日
    3

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号