我们知道,在抓取数据时,多多少少会因为个别的原因(网络不好等)出现请求的故障,这样会影响我们抓取数据的效率,那遇到这样的问题我们该怎么解决呢?直接用try模式?这样会影响到抓取的数据量,这个时候retry函数就用到了。
首先安装,很简单pip install retry
然后就是讲一下,retry函数的几个参数的意义,当然如果英文够好的可以直接看源代码就可以官网。
各个参数的含义
def retry(exceptions=Exception, tries=-1, delay=0, max_delay=None, backoff=1, jitter=0, logger=logging_logger): """Return a retry decorator. :param exceptions:捕获异常或异常元组。 默认:Exception。 :param tries:Exception最大尝试次数。 默认值:-1(无限)。 :param delay:尝试之间的初始延迟。 默认值:0。 :param max_delay:延迟的最大值。 默认值:无(无限制)。 :param backoff:乘法器应用于尝试之间的延迟。 默认值:1(无退避)。 :param jitter:额外的秒数添加到尝试之间的延迟。 默认值:0。 如果数字固定,则随机如果范围元组(最小值,最大值) :param logger:logger.warning(fmt,error,delay)将在失败尝试中调用。 默认值:retry.logging_logger。 如果无,则记录被禁用。 """
①使用时,如果不带参数会认为是默认的参数,那么遇到异常时会一直retry下去,直到成功
from retry import retry @retry() def make_trouble(): '''Retry until succeed''' print ('retrying...') raise if __name__ == '__main__': make_trouble() # 输出: 一直重试,直到运行成功 retrying... retrying... retrying... retrying... retrying... retrying...
2.Exception参数, 默认 Exception, 只捕获重试指定的异常,可以是元组
@retry(ZeroDivisionError, tries=3, delay=2) def make_trouble(): '''Retry on ZeroDivisionError, raise error after 3 attempts, sleep 2 seconds between attempts.''' print 'aaa' a = 1/0 if __name__ == '__main__': make_trouble() 输出: aaa aaa Traceback (most recent call last): File "E:/WORKSPACE/document/document/test/1.py", line 20, in
make_trouble() File "
", line 2, in make_trouble File "D:\python27\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\lib\site-packages\retry\api.py", line 74, in retry_decorator logger) File "D:\python27\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\lib\site-packages\retry\api.py", line 33, in __retry_internal return f() File "E:/WORKSPACE/document/document/test/1.py", line 16, in make_trouble a = 1/0 ZeroDivisionError: integer division or modulo by zero aaa
3.backoff参数,尝试间隔时间,成倍数增加
import time @retry((ValueError, TypeError), delay=1, backoff=2) def make_trouble(): '''Retry on ValueError or TypeError, sleep 1, 2, 4, 8, ... seconds between attempts.''' print (1, int(time.time())) raise ValueError('a') if __name__ == '__main__': make_trouble() 输出: (1, ) (1, ) (1, ) (1, ) (1, ) (1, ) (1, )
4.max_delay 指定最大间隔时间,backoff参数触发的休眠时间大于max_delay时,休眠时间以max_delay为准则
import time @retry((ValueError, TypeError), delay=1, backoff=2, max_delay=8) def make_trouble(): '''Retry on ValueError or TypeError, sleep 1, 2, 4, 4, ... seconds between attempts.''' print (1, int(time.time())) raise ValueError('aa') if __name__ == '__main__': make_trouble() 输出: (1, ) (1, ) (1, ) (1, ) (1, ) (1, ) (1, ) (1, )
5.jitter参数,累加,以及异常触发的日志
import time @retry(ValueError, delay=1, jitter=1) def make_trouble(): '''Retry on ValueError, sleep 1, 2, 3, 4, ... seconds between attempts.''' print (1, int(time.time())) raise ValueError('e') if __name__ == '__main__': import logging logging.basicConfig() make_trouble() 输出: WARNING:retry.api:e, retrying in 1 seconds... (1, ) WARNING:retry.api:e, retrying in 2 seconds... (1, ) WARNING:retry.api:e, retrying in 3 seconds... (1, ) WARNING:retry.api:e, retrying in 4 seconds... (1, ) WARNING:retry.api:e, retrying in 5 seconds... (1, ) WARNING:retry.api:e, retrying in 6 seconds... (1, ) (1, ) WARNING:retry.api:e, retrying in 7 seconds... (1, ) WARNING:retry.api:e, retrying in 8 seconds...
retry_call
def retry_call(f, fargs=None, fkwargs=None, exceptions=Exception, tries=-1, delay=0, max_delay=None, backoff=1, jitter=0, logger=logging_logger): “”” Calls a function and re-executes it if it failed. :param f: the function to execute. :param fargs: the positional arguments of the function to execute. :param fkwargs: the named arguments of the function to execute. :param exceptions: an exception or a tuple of exceptions to catch. default: Exception. :param tries: the maximum number of attempts. default: -1 (infinite). :param delay: initial delay between attempts. default: 0. :param max_delay: the maximum value of delay. default: None (no limit). :param backoff: multiplier applied to delay between attempts. default: 1 (no backoff). :param jitter: extra seconds added to delay between attempts. default: 0. fixed if a number, random if a range tuple (min, max) :param logger: logger.warning(fmt, error, delay) will be called on failed attempts. default: retry.logging_logger. if None, logging is disabled. :returns: the result of the f function. """
示例如下:
import requests from retry.api import retry_call def make_trouble(service, info=None): if not info: info = '' print ('retry..., service: {}, info: {}'.format(service, info)) r = requests.get(service + info) print r.text raise Exception('info') def what_is_my_ip(approach=None): if approach == "optimistic": tries = 1 elif approach == "conservative": tries = 3 else: # skeptical tries = -1 result = retry_call(make_trouble, fargs=["http://ipinfo.io/"], fkwargs={"info": "ip"}, tries=tries) print(result) if __name__ == '__main__': import logging logging.basicConfig() what_is_my_ip("conservative") 输出: retry..., service: http://ipinfo.io/, info: ip 118.113.1.255 retry..., service: http://ipinfo.io/, info: ip WARNING:retry.api:info, retrying in 0 seconds... WARNING:retry.api:info, retrying in 0 seconds... 118.113.1.255 retry..., service: http://ipinfo.io/, info: ip Traceback (most recent call last): File "E:/WORKSPACE/document/document/test/1.py", line 74, in
what_is_my_ip("conservative") File "E:/WORKSPACE/document/document/test/1.py", line 66, in what_is_my_ip result = retry_call(make_trouble, fargs=["http://ipinfo.io/"], fkwargs={"info": "ip"}, tries=tries) File "D:\python27\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\lib\site-packages\retry\api.py", line 101, in retry_call return __retry_internal(partial(f, *args, kwargs), exceptions, tries, delay, max_delay, backoff, jitter, logger) File "D:\python27\WinPython-64bit-2.7.10.3\python-2.7.10.amd64\lib\site-packages\retry\api.py", line 33, in __retry_internal return f() File "E:/WORKSPACE/document/document/test/1.py", line 54, in make_trouble raise Exception('info') Exception: info 118.113.1.255
参考博客:http://www.chenxm.cc/article/235.html
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/224775.html原文链接:https://javaforall.net
