Welcome to hellopython’s documentation!¶
Python介绍¶
本文简要介绍一下Python。
概要¶
Python是一种计算机程序设计语言。是一门高级的、动态的、面向对象、解释性的脚本语言,最初被设计用于编写自动化脚本(shell),随着版本的不断更新和语言新功能的添加,越来越多被用于独立的、大型项目的开发。
说明:
- 高级语言:高级语言贴近开发者(Python,Java),对应于底层语言(如C、C++、汇编),底层语言贴近机器。越底层的语言越能操作计算机硬件。
- 动态语言:相对于静态语言,静态语言要求在必须声明每个变量的类型,它会使用多少内存以及允许使用的方法;动态语言并不需要在使用变量进行声明。动态语言通常比编译后的静态语言更慢。
- 面向对象:对应于面向过程,是开发人员在开发过程当中的思路,是程序员的世界观。是Python开发的核心,是一种编程思维。
- 解释性:对应于编译性语言(如C语言),Python每次执行都需要执行脚本,不会被编译,而是由解释器程序来解释执行。
Python 发展历史¶
Python的创始人为荷兰人吉多·范罗苏姆(Guido van Rossum)。1989年圣诞节期间,在阿姆斯特丹,Guido为了打发圣诞节的无趣,决心开发一个新的脚本解释程序,作为ABC 语言的一种继承。之所以选中Python(大蟒蛇的意思)作为该编程语言的名字,是取自英国20世纪70年代首播的电视喜剧《蒙提·派森干的飞行马戏团》(Monty Python’s Flying Circus)。
Python已经成为最受欢迎的程序设计语言之一。自从2004年以后,python的使用率呈线性增长。Python 2于2000年10月16日发布,稳定版本是Python 2.7。Python 3于2008年12月3日发布,不完全兼容Python 2。 2011年1月,它被TIOBE编程语言排行榜评为2010年度语言。JavaScript,Java和Python是2018年学习工作的最佳编程语言。TIOBE编程语言排行榜评定Python是2018年的Programming Language Hall of Fame(编程语言名人堂)获得者。
- Python之父: Guido van Rossum
Python应用¶
Python的应用非常广泛:
- 系统编程:提供API,能够方便的进行系统维护和管理。Linux下标志性语言之一,是很多系统管理员理想的编程工具。如yum,trash-put等工具。
- 图形处理:有PIL、Tkinter等图形库支持,能方便进行图形处理。
- 数学处理:NumPy扩展提供大量与许多标准数学库的接口。
- 文本处理:python提供的re模块能支持正则表达式,还提供SGML,XML分析模块,许多程序员利用python进行XML程序的开发。
- 数据库编程:程序员可通过遵循Python DB-API(数据库应用程序编程接口)规范的模块与Microsoft SQL Server,Oracle,Sybase,DB2,MySQL、SQLite等数据库通信。python自带有一个Gadfly模块,提供了一个完整的SQL环境。
- 网络编程:提供丰富的模块支持sockets编程,能方便快速地开发分布式应用程序。很多大规模软件开发计划例如Zope,Mnet 及BitTorrent. Google都在广泛地使用它。
- Web编程:应用的开发语言,支持最新的XML技术。
- 自动化运维:这几乎是Python应用的自留地,作为运维工程师首选的编程语言,Python在自动化运维方面已经深入人心,比如Saltstack和Ansible都是大名鼎鼎的自动化平台。
- 云计算: 开源云计算解决方案OpenStack就是基于Python开发的。
- 网络爬虫:也称网络蜘蛛,是大数据行业获取数据的核心工具。没有网络爬虫自动地、不分昼夜地、高智能地在互联网上爬取免费的数据,那些大数据相关的公司恐怕要少四分之三。能够编写网络爬虫的编程语言有不少,但Python绝对是其中的主流之一,其Scripy爬虫框架应用非常广泛。
- 数据分析:在大量数据的基础上,结合科学计算、机器学习等技术,对数据进行清洗、去重、规格化和针对性的分析是大数据行业的基石。Python是数据分析的主流语言之一。
- 人工智能:Python在人工智能大范畴领域内的机器学习、神经网络、深度学习等方面都是主流的编程语言,得到广泛的支持和应用。
Python2 or Python3¶
Python语言作者Guido van Rossum在 邮件 列表上宣布 Python 2.7将于2020年1月1日终止支持。用户如果想要在这个日期之后继续得到与Python 2.7有关的支持,则需要付费给商业供应商。
由Python2即将失去官方支持,后续进行Python开发时,请使用Python3。
Python的安装¶
目录
本文讲解python3.6.2的安装。
Python下载地址¶
Python3.6.2安装文件的下载地址如下:https://www.python.org/downloads/release/python-362/
CentOS上的安装方法¶
下载安装文件:
[root@localhost ~/download]# wget https://www.python.org/ftp/python/3.6.2/Python-3.6.2.tgz
解压:
[root@localhost ~/download]# tar -zxvf Python-3.6.2.tgz
安装readline-devel解决方向键、Backspace键出现特殊符号:
[root@localhost ~/download]# yum install readline-devel
检测你的安装平台的目标特征:
[root@localhost ~/download]# cd Python-3.6.2 [root@localhost ~/download/Python-3.6.2]# ./configure
安装:
[root@localhost ~/download/Python-3.6.2]# make
编译:
[root@localhost ~/download/Python-3.6.2]# make install
查看python3的版本:
[root@localhost ~/download/Python-3.6.2]# python -V Python 2.6.6 [root@localhost ~/download/Python-3.6.2]# python3 -V Python 3.6.2
CentOS系统中默认自带有Python程序,且版本是较低版本的Python 2.6.6,Python是Linux系统的基础软件,很多应用基于Python程序,不要随意改动系统的默认Python版本。
启动:
[root@localhost ~/download/Python-3.6.2]# python3 Python 3.6.2 (default, Jun 8 2018, 22:28:47) [GCC 4.4.7 20120313 (Red Hat 4.4.7-18)] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
为了更好的使用python程序,并且不影响系统的python2环境以及刚安装的python3环境,我们安装virtualenv创建隔绝的Python环境。
python虚拟环境virtualenv的安装¶
安装:
[root@localhost ~]# pip3 install virtualenv Collecting virtualenv Downloading https://files.pythonhosted.org/packages/b6/30/96a02b2287098b23b875bc8c2f58071c35d2efe84f747b64d523721dc2b5/virtualenv-16.0.0-py2.py3-none-any.whl (1.9MB) 100% |████████████████████████████████| 1.9MB 5.9MB/s Installing collected packages: virtualenv Successfully installed virtualenv-16.0.0 [root@localhost ~]# virtualenv --version 16.0.0
python虚拟环境virtualenv的使用¶
创建virtual虚拟环境目录:
[blogsystem@localhost ~]$ mkdir venv [blogsystem@localhost ~]$ ls blogs_home download venv
创建virtual虚拟运行环境:
[blogsystem@localhost ~]$ virtualenv venv Using base prefix '/usr/local' New python executable in /cloud/blogsystem/venv/bin/python3.6 Also creating executable in /cloud/blogsystem/venv/bin/python Installing setuptools, pip, wheel...done.
如果加上参数–no-site-packages,已经安装到系统Python环境中的所有第三方包都不会复制过来,会生成一个不带任何第三方包的“干净”的Python运行环境:
[blogsystem@localhost ~]$ virtualenv --no-site-package env Using base prefix '/usr/local' New python executable in /cloud/blogsystem/env1/bin/python3.6 Also creating executable in /cloud/blogsystem/env1/bin/python Installing setuptools, pip, wheel...done.
激活virtual虚拟运行环境venv:
[blogsystem@localhost ~]$ source venv/bin/activate (venv) [blogsystem@localhost ~]$
此时,命令行提示符发生了变化,有了 (venv) 前缀,表示当前处理名称为venv的python虚拟环境下。此时处于虚拟环境下,在该环境使用pip安装包,不会影响系统的Python环境,也不会影响他人的环境。
在虚拟环境venv下安装包:
(venv) [blogsystem@localhost ~]$ pip list Package Version ---------- ------- pip 10.0.1 setuptools 39.2.0 wheel 0.31.1 (venv) [blogsystem@localhost ~]$ pip install pymysql Collecting pymysql Downloading https://files.pythonhosted.org/packages/32/e8/222d9e1c7821f935d6dba8d4c60b9985124149b35a9f93a84f0b98afc219/PyMySQL-0.8.1-py2.py3-none-any.whl (81kB) 100% |████████████████████████████████| 81kB 63kB/s Installing collected packages: pymysql Successfully installed pymysql-0.8.1 (venv) [blogsystem@localhost ~]$ pip list Package Version ---------- ------- pip 10.0.1 PyMySQL 0.8.1 setuptools 39.2.0 wheel 0.31.1
导出虚拟环境venv下的所有包到requirements.txt文件:
(venv) [blogsystem@localhost ~]$ pip freeze > requirements.txt (venv) [blogsystem@localhost ~]$ ls blogs_home download requirements.txt venv (venv) [blogsystem@localhost ~]$ cat requirements.txt PyMySQL==0.8.1
在虚拟环境venv下卸载包:
(venv) [blogsystem@localhost ~]$ pip uninstall pymysql Uninstalling PyMySQL-0.8.1: Would remove: /cloud/blogsystem/venv/lib/python3.6/site-packages/PyMySQL-0.8.1.dist-info/* /cloud/blogsystem/venv/lib/python3.6/site-packages/pymysql/* Proceed (y/n)? y Successfully uninstalled PyMySQL-0.8.1 (venv) [blogsystem@localhost ~]$ pip list Package Version ---------- ------- pip 10.0.1 setuptools 39.2.0 wheel 0.31.1
退出虚拟环境venv:
(venv) [blogsystem@localhost ~]$ deactivate [blogsystem@localhost ~]$
删除虚拟环境venv,直接删除venv文件夹即可:
[blogsystem@localhost ~]$ ls blogs_home download requirements.txt venv [blogsystem@localhost ~]$ rm -rf venv/ [blogsystem@localhost ~]$ ls blogs_home download requirements.txt
通过requirements.txt在虚拟环境中安装包:
[blogsystem@localhost ~]$ virtualenv venv Using base prefix '/usr/local' New python executable in /cloud/blogsystem/venv/bin/python3.6 Also creating executable in /cloud/blogsystem/venv/bin/python Installing setuptools, pip, wheel...done. [blogsystem@localhost ~]$ source venv/bin/activate (venv) [blogsystem@localhost ~]$ pip list Package Version ---------- ------- pip 10.0.1 setuptools 39.2.0 wheel 0.31.1 (venv) [blogsystem@localhost ~]$ pip install -r requirements.txt Collecting PyMySQL==0.8.1 (from -r requirements.txt (line 1)) Using cached https://files.pythonhosted.org/packages/32/e8/222d9e1c7821f935d6dba8d4c60b9985124149b35a9f93a84f0b98afc219/PyMySQL-0.8.1-py2.py3-none-any.whl Installing collected packages: PyMySQL Successfully installed PyMySQL-0.8.1 (venv) [blogsystem@localhost ~]$ pip list Package Version ---------- ------- pip 10.0.1 PyMySQL 0.8.1 setuptools 39.2.0 wheel 0.31.1
以上安装并没有配置pip安装所使用的源,默认为官方的源,受网络影响,有时安装可能会比较慢,同时,使用vitrualenv运行虚拟环境时,必须需要到特定的目录下才能启动虚拟环境,使用有些不便,下面针对以上两个问题,分别配置pip国内源,以及安装virtualenvwrapper来管理虚拟环境。
pip国内镜像源配置¶
- linux环境配置方法
更改默认配置,~/.pip/pip.conf,一般这个文件需要自己创建:
mkdir ~/.pip
vim ~/.pip/pip.conf
在pip.conf文件中添加以下内容:
[global]
index-url = http://mirrors.aliyun.com/pypi/simple/
[install]
trusted-host = mirrors.aliyun.com
- windows环境配置方法
在当前用户下目录,新建一个pip文件夹和pip.ini文件,并在pip.ini中添加以下内容:
[global]
index-url = http://mirrors.aliyun.com/pypi/simple/
[install]
trusted-host = mirrors.aliyun.com
pip常用命令¶
- pip install package_name 安装包
- pip uninstall -y package_name 卸载包
- pip search package_name 查询包名
- pip list 列出安装了哪些包
- pip freeze > requirements.txt 生成依赖包列表
- pip install -r requirements.txt 安装依赖包
virtualenvwrapper的安装¶
- linux环境配置方法
使用pip进行安装,可以发现pip源已经替换成的阿里云源:
[root@localhost ~]# pip install virtualenvwrapper
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting virtualenvwrapper
Downloading http://mirrors.aliyun.com/pypi/packages/2b/8c/3192e10913ad945c0f0fcb17e9b2679434a28ad58ee31ce0104cba3b1154/virtualenvwrapper-4.8.2-py2.py3-none-any.whl
Requirement already satisfied: stevedore in /usr/local/lib/python3.6/site-packages (from virtualenvwrapper) (1.28.0)
Requirement already satisfied: virtualenv in /usr/local/lib/python3.6/site-packages (from virtualenvwrapper) (16.0.0)
Requirement already satisfied: virtualenv-clone in /usr/local/lib/python3.6/site-packages (from virtualenvwrapper) (0.3.0)
Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.6/site-packages (from stevedore->virtualenvwrapper) (1.11.0)
Requirement already satisfied: pbr!=2.1.0,>=2.0.0 in /usr/local/lib/python3.6/site-packages (from stevedore->virtualenvwrapper) (4.0.4)
Installing collected packages: virtualenvwrapper
Successfully installed virtualenvwrapper-4.8.2
创建虚拟目录:
[root@localhost ~]# mkdir virtual_env
在~/.bashrc中末尾添加配置信息,并保存:
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
export WORKON_HOME=/root/virtual_env
source /usr/local/bin/virtualenvwrapper.sh
使配置信息的修改生效:
[root@localhost ~]# source ~/.bashrc
- windows环境配置方法
使用pip进行安装,可以发现pip源已经替换成的阿里云源:
E:\meichaohui\sphinx_data\meizhaohui_blog>pip install virtualenvwrapper-win
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting virtualenvwrapper-win
Downloading http://mirrors.aliyun.com/pypi/packages/f5/23/4cba98733b9122219ce67177d745e4984b524b867cf3728eaa807ea21919/virtualenvwrapper-win-1.2.5.tar.gz
Requirement already satisfied: virtualenv in d:\program files (x86)\python3.6.2\lib\site-packages (from virtualenvwrapper-win) (16.0.0)
Installing collected packages: virtualenvwrapper-win
Running setup.py install for virtualenvwrapper-win ... done
Successfully installed virtualenvwrapper-win-1.2.5
创建虚拟目录:
在D:\data目录下创建虚拟目录virtualenv_home。
配置环境变量:
依次打开 控制面板\系统和安全\系统\高级系统设置\高级\环境变量,添加环境变量WORKON_HOME
变量名:WORKON_HOME
变量值:D:\data\virtualenv_home
virtualenvwrapper的使用¶
linux环境virtualenvwrapper获取帮助:
[root@localhost ~]# virtualenvwrapper virtualenvwrapper is a set of extensions to Ian Bicking's virtualenv tool. The extensions include wrappers for creating and deleting virtual environments and otherwise managing your development workflow, making it easier to work on more than one project at a time without introducing conflicts in their dependencies. For more information please refer to the documentation: http://virtualenvwrapper.readthedocs.org/en/latest/command_ref.html Commands available: add2virtualenv: add directory to the import path allvirtualenv: run a command in all virtualenvs cdproject: change directory to the active project cdsitepackages: change to the site-packages directory cdvirtualenv: change to the $VIRTUAL_ENV directory cpvirtualenv: duplicate the named virtualenv to make a new one lssitepackages: list contents of the site-packages directory lsvirtualenv: list virtualenvs mkproject: create a new project directory and its associated virtualenv mktmpenv: create a temporary virtualenv mkvirtualenv: Create a new virtualenv in $WORKON_HOME rmvirtualenv: Remove a virtualenv setvirtualenvproject: associate a project directory with a virtualenv showvirtualenv: show details of a single virtualenv toggleglobalsitepackages: turn access to global site-packages on/off virtualenvwrapper: show this help message wipeenv: remove all packages installed in the current virtualenv workon: list or change working virtualenvs
windows环境virtualenvwrapper获取帮助:
D:\data> virtualenvwrapper virtualenvwrapper is a set of extensions to Ian Bicking's virtualenv tool. The extensions include wrappers for creating and deleting virtual environments and otherwise managing your development workflow, making it easier to work on more than one project at a time without introducing conflicts in their dependencies. virtualenvwrapper-win is a port of Dough Hellman's virtualenvwrapper to Windows batch scripts. Commands available: add2virtualenv: add directory to the import path cdproject: change directory to the active project cdsitepackages: change to the site-packages directory cdvirtualenv: change to the $VIRTUAL_ENV directory lssitepackages: list contents of the site-packages directory lsvirtualenv: list virtualenvs mkproject: create a new project directory and its associated virtualenv mkvirtualenv: Create a new virtualenv in $WORKON_HOME rmvirtualenv: Remove a virtualenv setprojectdir: associate a project directory with a virtualenv toggleglobalsitepackages: turn access to global site-packages on/off virtualenvwrapper: show this help message whereis: return full path to executable on path. workon: list or change working virtualenvs
通过上面的帮助,可以知道linux系统和windows系统上面virtualenvwrapper大部分命令相同,下面在windows上面使用virtualenvwrapper。
virtualenvwrapper常用命令:
workon:列出虚拟环境列表 lsvirtualenv:列出虚拟环境列表 mkvirtualenv [virtualenv_name]:新建虚拟环境 workon [virtualenv_name]:切换虚拟环境 rmvirtualenv [virtualenv_name]:删除虚拟环境 deactivate: 离开虚拟环境
虚拟环境的使用示例:
D:\data>workon Pass a name to activate one of the following virtualenvs: ============================================================================== venv D:\data>lsvirtualenv dir /b /ad "D:\data\virtualenv_home" ============================================================================== venv D:\data>mkvirtualenv venv_test Using base prefix 'd:\\program files (x86)\\python3.6.2' New python executable in D:\data\virtualenv_home\venv_test\Scripts\python.exe Installing setuptools, pip, wheel...done. (venv_test) D:\data>workon Pass a name to activate one of the following virtualenvs: ============================================================================== venv venv_test (venv_test) D:\data>lsvirtualenv dir /b /ad "D:\data\virtualenv_home" ============================================================================== venv venv_test (venv_test) D:\data>workon venv_test (venv_test) D:\data>pip install pymysql Looking in indexes: http://mirrors.aliyun.com/pypi/simple/ Collecting pymysql Downloading http://mirrors.aliyun.com/pypi/packages/32/e8/222d9e1c7821f935d6dba8d4c60b9985124149b35a9f93a84f0b98afc219/PyMySQL-0.8.1-py2.py3-none-any.whl (81kB) 100% |████████████████████████████████| 81kB 989kB/s Installing collected packages: pymysql Successfully installed pymysql-0.8.1 (venv_test) D:\data>rmvirtualenv venv_test Deleted D:\data\virtualenv_home\venv_test (venv) D:\data>deactivate D:\data>
Linux环境安装和使用ipython和jupyter notebook¶
- ipython是一个python的交互式shell,比默认的python shell好用得多,支持变量自动补全,自动缩进,支持bash shell命令,内置了许多很有用的功能和函数。学习ipython将会让我们以一种更高的效率来使用python。同时它也是利用Python进行科学计算和交互可视化的一个最佳的平台。
安装ipython和jupyter:
[root@hellolinux ~]# pip install ipython jupyter
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting ipython
Downloading https://mirrors.aliyun.com/pypi/packages/f6/c4/a79582814bdfe92bfca4d286a729304ffdf13f5135132cfcaea13cf1b2b3/ipython-7.7.0-py3-none-any.whl (774kB)
|████████████████████████████████| 778kB 381kB/s
Collecting jupyter
Downloading https://mirrors.aliyun.com/pypi/packages/83/df/0f5dd132200728a86190397e1ea87cd76244e42d39ec5e88efd25b2abd7e/jupyter-1.0.0-py2.py3-none-any.whl
Requirement already satisfied: jedi>=0.10 in /usr/lib/python3.6/site-packages (from ipython) (0.15.1)
Requirement already satisfied: setuptools>=18.5 in /usr/lib/python3.6/site-packages (from ipython) (39.0.1)
Requirement already satisfied: backcall in /usr/lib/python3.6/site-packages (from ipython) (0.1.0)
Requirement already satisfied: traitlets>=4.2 in /usr/lib/python3.6/site-packages (from ipython) (4.3.2)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in /usr/lib/python3.6/site-packages (from ipython) (2.0.9)
Requirement already satisfied: pexpect; sys_platform != "win32" in /usr/lib/python3.6/site-packages (from ipython) (4.7.0)
Requirement already satisfied: decorator in /usr/lib/python3.6/site-packages (from ipython) (4.4.0)
Requirement already satisfied: pickleshare in /usr/lib/python3.6/site-packages (from ipython) (0.7.5)
Requirement already satisfied: pygments in /usr/lib64/python3.6/site-packages (from ipython) (2.4.2)
Collecting ipykernel (from jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/d4/16/43f51f65a8a08addf04f909a0938b06ba1ee1708b398a9282474531bd893/ipykernel-5.1.2-py3-none-any.whl (116kB)
|████████████████████████████████| 122kB 1.9MB/s
Collecting jupyter-console (from jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/cb/ee/6374ae8c21b7d0847f9c3722dcdfac986b8e54fa9ad9ea66e1eb6320d2b8/jupyter_console-6.0.0-py2.py3-none-any.whl
Collecting nbconvert (from jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/f9/df/4505c0a7fea624cac461d0f41051f33456ae656753f65cee8c2f43121cb2/nbconvert-5.6.0-py2.py3-none-any.whl (453kB)
|████████████████████████████████| 460kB 1.7MB/s
Collecting ipywidgets (from jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/56/a0/dbcf5881bb2f51e8db678211907f16ea0a182b232c591a6d6f276985ca95/ipywidgets-7.5.1-py2.py3-none-any.whl (121kB)
|████████████████████████████████| 122kB 4.7MB/s
Collecting notebook (from jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/f3/a1/1e07cedcb554408fefe4a7d32b2a041c86517167aec6ca8251c808ef6c1e/notebook-6.0.1-py3-none-any.whl (9.0MB)
|████████████████████████████████| 9.0MB 2.1MB/s
Collecting qtconsole (from jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/21/a0/37a7b61eeac6d02cdabc45a60659297e3017f6ff7f2ca6bdec629aa248dd/qtconsole-4.5.4-py2.py3-none-any.whl (120kB)
|████████████████████████████████| 122kB 2.3MB/s
Requirement already satisfied: parso>=0.5.0 in /usr/lib/python3.6/site-packages (from jedi>=0.10->ipython) (0.5.1)
Requirement already satisfied: six in /usr/lib/python3.6/site-packages (from traitlets>=4.2->ipython) (1.12.0)
Requirement already satisfied: ipython-genutils in /usr/lib/python3.6/site-packages (from traitlets>=4.2->ipython) (0.2.0)
Requirement already satisfied: wcwidth in /usr/lib/python3.6/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython) (0.1.7)
Requirement already satisfied: ptyprocess>=0.5 in /usr/lib/python3.6/site-packages (from pexpect; sys_platform != "win32"->ipython) (0.6.0)
Collecting tornado>=4.2 (from ipykernel->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/30/78/2d2823598496127b21423baffaa186b668f73cd91887fcef78b6eade136b/tornado-6.0.3.tar.gz (482kB)
|████████████████████████████████| 491kB 4.2MB/s
Collecting jupyter-client (from ipykernel->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/af/4c/bf613864ae0644e2ac7d4a40bd209c40c8c71e3dc88d5f1d0aa92a68e716/jupyter_client-5.3.1-py2.py3-none-any.whl (91kB)
|████████████████████████████████| 92kB 6.2MB/s
Collecting jinja2>=2.4 (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/1d/e7/fd8b501e7a6dfe492a433deb7b9d833d39ca74916fa8bc63dd1a4947a671/Jinja2-2.10.1-py2.py3-none-any.whl (124kB)
|████████████████████████████████| 133kB 3.6MB/s
Collecting nbformat>=4.4 (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/da/27/9a654d2b6cc1eaa517d1c5a4405166c7f6d72f04f6e7eea41855fe808a46/nbformat-4.4.0-py2.py3-none-any.whl (155kB)
|████████████████████████████████| 163kB 3.9MB/s
Collecting entrypoints>=0.2.2 (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/ac/c6/44694103f8c221443ee6b0041f69e2740d89a25641e62fb4f2ee568f2f9c/entrypoints-0.3-py2.py3-none-any.whl
Collecting testpath (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/be/a4/162f9ebb6489421fe46dcca2ae420369edfee4b563c668d93cb4605d12ba/testpath-0.4.2-py2.py3-none-any.whl (163kB)
|████████████████████████████████| 163kB 2.7MB/s
Collecting jupyter-core (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/e6/25/6ffb0f6e57fa6ef5d2f814377133b361b42a6dd39105f4885a4f1666c2c3/jupyter_core-4.5.0-py2.py3-none-any.whl (78kB)
|████████████████████████████████| 81kB 3.0MB/s
Collecting bleach (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/ab/05/27e1466475e816d3001efb6e0a85a819be17411420494a1e602c36f8299d/bleach-3.1.0-py2.py3-none-any.whl (157kB)
|████████████████████████████████| 163kB 2.6MB/s
Collecting mistune<2,>=0.8.1 (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/09/ec/4b43dae793655b7d8a25f76119624350b4d65eb663459eb9603d7f1f0345/mistune-0.8.4-py2.py3-none-any.whl
Collecting pandocfilters>=1.4.1 (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/4c/ea/236e2584af67bb6df960832731a6e5325fd4441de001767da328c33368ce/pandocfilters-1.4.2.tar.gz
Collecting defusedxml (from nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/06/74/9b387472866358ebc08732de3da6dc48e44b0aacd2ddaa5cb85ab7e986a2/defusedxml-0.6.0-py2.py3-none-any.whl
Collecting widgetsnbextension~=3.5.0 (from ipywidgets->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/6c/7b/7ac231c20d2d33c445eaacf8a433f4e22c60677eb9776c7c5262d7ddee2d/widgetsnbextension-3.5.1-py2.py3-none-any.whl (2.2MB)
|████████████████████████████████| 2.2MB 3.0MB/s
Collecting Send2Trash (from notebook->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/49/46/c3dc27481d1cc57b9385aff41c474ceb7714f7935b1247194adae45db714/Send2Trash-1.5.0-py3-none-any.whl
Collecting prometheus-client (from notebook->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/b3/23/41a5a24b502d35a4ad50a5bb7202a5e1d9a0364d0c12f56db3dbf7aca76d/prometheus_client-0.7.1.tar.gz
Collecting pyzmq>=17 (from notebook->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/75/89/6f0ea51ffa9c2c00c0ab0460f137b16a5ab5b47e3b060c5b1fc9ca425836/pyzmq-18.1.0-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
|████████████████████████████████| 1.1MB 5.2MB/s
Collecting terminado>=0.8.1 (from notebook->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/a7/56/80ea7fa66565fa75ae21ce0c16bc90067530e5d15e48854afcc86585a391/terminado-0.8.2-py2.py3-none-any.whl
Collecting python-dateutil>=2.1 (from jupyter-client->ipykernel->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/41/17/c62faccbfbd163c7f57f3844689e3a78bae1f403648a6afb1d0866d87fbb/python_dateutil-2.8.0-py2.py3-none-any.whl (226kB)
|████████████████████████████████| 235kB 3.5MB/s
Collecting MarkupSafe>=0.23 (from jinja2>=2.4->nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/b2/5f/23e0023be6bb885d00ffbefad2942bc51a620328ee910f64abe5a8d18dd1/MarkupSafe-1.1.1-cp36-cp36m-manylinux1_x86_64.whl
Collecting jsonschema!=2.5.0,>=2.4 (from nbformat>=4.4->nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/54/48/f5f11003ceddcd4ad292d4d9b5677588e9169eef41f88e38b2888e7ec6c4/jsonschema-3.0.2-py2.py3-none-any.whl (54kB)
|████████████████████████████████| 61kB 2.7MB/s
Collecting webencodings (from bleach->nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/f4/24/2a3e3df732393fed8b3ebf2ec078f05546de641fe1b667ee316ec1dcf3b7/webencodings-0.5.1-py2.py3-none-any.whl
Collecting pyrsistent>=0.14.0 (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/b9/66/b2638d96a2d128b168d0dba60fdc77b7800a9b4a5340cefcc5fc4eae6295/pyrsistent-0.15.4.tar.gz (107kB)
|████████████████████████████████| 112kB 5.0MB/s
Collecting attrs>=17.4.0 (from jsonschema!=2.5.0,>=2.4->nbformat>=4.4->nbconvert->jupyter)
Downloading https://mirrors.aliyun.com/pypi/packages/23/96/d828354fa2dbdf216eaa7b7de0db692f12c234f7ef888cc14980ef40d1d2/attrs-19.1.0-py2.py3-none-any.whl
Installing collected packages: ipython, tornado, jupyter-core, python-dateutil, pyzmq, jupyter-client, ipykernel, jupyter-console, MarkupSafe, jinja2, pyrsistent, attrs, jsonschema, nbformat, entrypoints, testpath, webencodings, bleach, mistune, pandocfilters, defusedxml, nbconvert, Send2Trash, prometheus-client, terminado, notebook, widgetsnbextension, ipywidgets, qtconsole, jupyter
Running setup.py install for tornado ... done
Running setup.py install for pyrsistent ... done
Running setup.py install for pandocfilters ... done
Running setup.py install for prometheus-client ... done
Successfully installed MarkupSafe-1.1.1 Send2Trash-1.5.0 attrs-19.1.0 bleach-3.1.0 defusedxml-0.6.0 entrypoints-0.3 ipykernel-5.1.2 ipython-7.7.0 ipywidgets-7.5.1 jinja2-2.10.1 jsonschema-3.0.2 jupyter-1.0.0 jupyter-client-5.3.1 jupyter-console-6.0.0 jupyter-core-4.5.0 mistune-0.8.4 nbconvert-5.6.0 nbformat-4.4.0 notebook-6.0.1 pandocfilters-1.4.2 prometheus-client-0.7.1 pyrsistent-0.15.4 python-dateutil-2.8.0 pyzmq-18.1.0 qtconsole-4.5.4 terminado-0.8.2 testpath-0.4.2 tornado-6.0.3 webencodings-0.5.1 widgetsnbextension-3.5.1
[root@hellolinux ~]#
使用ipython:
[root@hellolinux ~]# ipython
Python 3.6.8 (default, May 2 2019, 20:40:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.7.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import os
In [2]: os.name
Out[2]: 'posix'
In [3]: quit
[root@hellolinux ~]#
ipython设置经典”>>>”提示符:
# 创建IPython的自定义配置文件
[root@hellolinux ~]# ipython profile create
[ProfileCreate] Generating default config file: '/root/.ipython/profile_default/ipython_config.py'
[ProfileCreate] Generating default config file: '/root/.ipython/profile_default/ipython_kernel_config.py'
# 修改~/.ipython/profile_default/ipython_config.py配置文件,设置经典提示符的风格ClassicPrompts
[root@hellolinux ~]# sed -i "s@#c.TerminalInteractiveShell.prompts_class = 'IPython.terminal.prompts.Prompts'@c.TerminalInteractiveShell.prompts_class = 'IPython.terminal.prompts.ClassicPrompts'@g" /root/.ipython/profile_default/ipython_config.py
[root@hellolinux ~]# cat .ipython/profile_default/ipython_config.py|sed -n '327p'
c.TerminalInteractiveShell.prompts_class = 'IPython.terminal.prompts.ClassicPrompts'
再次打开ipython:
[root@hellolinux ~]# ipython
Python 3.6.8 (default, May 2 2019, 20:40:44)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.7.0 -- An enhanced Interactive Python. Type '?' for help.
>>> import os
>>> os.name
'posix'
>>> quit
[root@hellolinux ~]#
- Jupyter Notebook允许你在浏览器中执行Python命令!IPython Notebook实质就是Jupyter Notebook。
- IPython Notebook使用浏览器作为界面,向后台的IPython服务器发送请求,并显示结果。在浏览器的界面中使用单元(Cell)保存各种信息。Cell有多种类型,需要强调的是,它也支持MarkDown语法,所以可以有MarkDown格式化文本单元,也可以有表示代码的Code单元。
- IPython Notebook有一个重要的特点就是:可重复性的互动计算,这意味着我们可以重复更改并且执行曾经的输入记录。它可以保存成其他很多格式,比如Python脚本,HTML,PDF等,所以它可以记录我们的演算过程。很多课程,博客以及书籍都是用Notebook写的。
Jupyter的配置:
# 生成一个notebook配置文件
[root@hellolinux ~]# jupyter notebook --generate-config
Writing default config to: /root/.jupyter/jupyter_notebook_config.py
# 生成密码
[root@hellolinux ~]# jupyter notebook password
Enter password: <-- 此处输入密码,如123456
Verify password: <-- 此处再次输入密码,与上面的密码保持一致
[NotebookPasswordApp] Wrote hashed password to /root/.jupyter/jupyter_notebook_config.json
# 获取刚才生成的加密的密码,注:jq工具是解决json字符串的工具
[root@hellolinux ~]# cat .jupyter/jupyter_notebook_config.json |jq
{
"NotebookApp": {
"password": "sha1:c80b33f88f4a:2c5ea918c4eaa88119ee195d68cdefd4d61b99f2"
}
}
[root@hellolinux ~]# notebook_passwd=$(cat .jupyter/jupyter_notebook_config.json |jq '.NotebookApp.password'|sed 's/"//g')
[root@hellolinux ~]# echo $notebook_passwd
sha1:c80b33f88f4a:2c5ea918c4eaa88119ee195d68cdefd4d61b99f2
# 生成自签名证书,参考 https://jupyter-notebook.readthedocs.io/en/stable/public_server.html?highlight=certfile#securing-a-notebook-server
[root@hellolinux ~]# cd .jupyter/
[root@hellolinux .jupyter]# openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout mykey.key -out mycert.pem
Generating a 2048 bit RSA private key
................................+++
........+++
writing new private key to 'mykey.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:CN
State or Province Name (full name) []:hubei
Locality Name (eg, city) [Default City]:wuhan
Organization Name (eg, company) [Default Company Ltd]:IT
Organizational Unit Name (eg, section) []:hellolinux.com
Common Name (eg, your name or your server's hostname) []:hellolinux.com
Email Address []:mzh.whut@gmail.com
[root@hellolinux .jupyter]# ls
jupyter_notebook_config.json jupyter_notebook_config.py migrated mycert.pem mykey.key
# 创建jupyter notebook的工作目录
[root@hellolinux ~]# mkdir /root/jupyter_data
# 修改配置文件/root/.jupyter/jupyter_notebook_config.py
[root@hellolinux ~]# cat -n /root/.jupyter/jupyter_notebook_config.py |sed -n '82p;85p;102p;204p;223p;261p;267p;276p;287p'
82 #c.NotebookApp.allow_remote_access = False
85 #c.NotebookApp.allow_root = False
102 #c.NotebookApp.certfile = ''
204 #c.NotebookApp.ip = 'localhost'
223 #c.NotebookApp.keyfile = ''
261 #c.NotebookApp.notebook_dir = ''
267 #c.NotebookApp.open_browser = True
276 #c.NotebookApp.password = ''
287 #c.NotebookApp.port = 8888
# 允许远程访问
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.allow_remote_access = False@c.NotebookApp.allow_remote_access = True@g" /root/.jupyter/jupyter_notebook_config.py
# 允许root运行jupyter
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.allow_root = False@c.NotebookApp.allow_root = True@g" /root/.jupyter/jupyter_notebook_config.py
# 设置自签名证书
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.certfile = ''@c.NotebookApp.certfile = '/root/.jupyter/mycert.pem'@g" /root/.jupyter/jupyter_notebook_config.py
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.keyfile = ''@c.NotebookApp.keyfile = '/root/.jupyter/mykey.key'@g" /root/.jupyter/jupyter_notebook_config.py
# 设置工作目录
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.notebook_dir = ''@c.NotebookApp.notebook_dir = '/root/jupyter_data'@g" /root/.jupyter/jupyter_notebook_config.py
# 在所有的网卡接口上开启服务
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.ip = 'localhost'@c.NotebookApp.ip = '*'@g" /root/.jupyter/jupyter_notebook_config.py
# 不打开浏览器
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.open_browser = True@c.NotebookApp.open_browser = False@g" /root/.jupyter/jupyter_notebook_config.py
# 设置远程访问的密码
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.password = ''@c.NotebookApp.password = '${notebook_passwd}'@g" /root/.jupyter/jupyter_notebook_config.py
# 指定WEB服务端口
[root@hellolinux ~]# sed -i "s@#c.NotebookApp.port = 8888@c.NotebookApp.port = 8888@g" /root/.jupyter/jupyter_notebook_config.py
# 修改后的内容
[root@hellolinux ~]# cat -n .jupyter/jupyter_notebook_config.py |sed -n '82p;85p;102p;204p;223p;267p;276p;287p'
82 c.NotebookApp.allow_remote_access = True
85 c.NotebookApp.allow_root = True
102 c.NotebookApp.certfile = '/root/.jupyter/mycert.pem'
204 c.NotebookApp.ip = '*'
223 c.NotebookApp.keyfile = '/root/.jupyter/mykey.key'
267 c.NotebookApp.open_browser = False
276 c.NotebookApp.password = 'sha1:c80b33f88f4a:2c5ea918c4eaa88119ee195d68cdefd4d61b99f2'
287 c.NotebookApp.port = 8888
服务器防火墙放行8888端口:
[root@hellolinux ~]# firewall-cmd --permanent --add-port=8888/tcp
success
[root@hellolinux ~]# firewall-cmd --reload
success
[root@hellolinux ~]# firewall-cmd --list-all|grep 8888
ports: 21/tcp 6379/tcp 8888/tcp
重新启动jupyter notebook:
[root@hellolinux ~]# jupyter notebook > jupyter.log 2>&1 &
[1] 12174
[root@hellolinux ~]# ps -ef|grep jupyter
root 12174 7909 20 17:19 pts/0 00:00:02 /usr/bin/python3.6 /usr/bin/jupyter-notebook
root 12179 7909 0 17:19 pts/0 00:00:00 grep --color=auto jupyter
在浏览器中访问jupyter notebook,链接地址: https://hellolinux.com:8888
注:因为我是在虚拟机中配置的jupyter notebook,并设置了ip和域名的对应关系:
[root@hellolinux ~]# cat /etc/hosts|grep hellolinux.com
127.0.0.1 hellolinux.com
[root@hellolinux ~]# ip a show|grep 192|awk -F'[ /]+' '{print $3}'
192.168.56.103
在远程客户端也配置了相应的域名解析对应关系:
$ type C:\Windows\System32\drivers\etc\hosts|findstr hellolinux.com
192.168.56.103 hellolinux.com
所以我可以通过域名访问jupyter notebook,如果你没有配置域名解决对应关系,通过ip地址(https://192.168.56.103:8888)也可以访问:

输入密码123456,登陆进jupyter notebook主页:

我们点击右上角的”New” –> “Python3” 可以创建Python3的notebook:

在新页面输入python命令,并按Ctrl + Enter键执行,如下图:

你可以使用 %timeit
计算程序执行时间,如 %timeit [i*i for i in range(1000)]
。

保存文件:

保存后,可以在服务器/root/jupyter_data目录中看到笔记文件first_notebook.ipynb:
[root@hellolinux ~]# ls -lah jupyter_data|grep first
-rw-r--r-- 1 root root 703 Aug 24 17:37 first_notebook.ipynb
参考文献
- 用Sphinx+reST编写文档https://www.cnblogs.com/zzqcn/p/5096876.html#_label7_4
- python虚拟环境–virtualenvhttps://www.cnblogs.com/technologylife/p/6635631.html
- securing-a-notebook-serverhttps://jupyter-notebook.readthedocs.io/en/stable/public_server.html?highlight=certfile#securing-a-notebook-server
- RunningtheNotebookhttps://jupyter.readthedocs.io/en/latest/running.html#running
学会使用命令帮助¶
概述¶
在python命令行,面对命令不知道怎么用,或不记得命令的拼写及参数时,我们需要求助于系统的帮助文档; python内置的帮助文档很详细,通常能解决我们的问题,我们需要掌握如何正确的去使用它们;
- 使用help(object)内置函数,如help(list)
- 使用dir(object)查看函数常用方法,如dir(list)
- 使用pydoc.help(request)查看帮助文档,如pydoc.help(re)
下面介绍这些命令。
命令使用¶
使用help方法查看帮助文档¶
不需要导入模块的:
>>> help(list)
Help on class list in module builtins:
class list(object)
| list() -> new empty list
| list(iterable) -> new list initialized from iterable's items
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
|
| __contains__(self, key, /)
| Return key in self.
|
| __delitem__(self, key, /)
| Delete self[key].
|
| __eq__(self, value, /)
| Return self==value.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(...)
| x.__getitem__(y) <==> x[y]
-- More --
需要导入模块的:
>>> import os
>>> help(os)
Help on module os:
NAME
os - OS routines for NT or Posix depending on what system we're on.
DESCRIPTION
This exports:
- all functions from posix or nt, e.g. unlink, stat, etc.
- os.path is either posixpath or ntpath
- os.name is either 'posix' or 'nt'
- os.curdir is a string representing the current directory (always '.')
- os.pardir is a string representing the parent directory (always '..')
- os.sep is the (or a most common) pathname separator ('/' or '\\')
- os.extsep is the extension separator (always '.')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
- os.defpath is the default search path for executables
- os.devnull is the file path of the null device ('/dev/null', etc.)
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then
only use functions that are defined by all platforms (e.g., unlink
and opendir), and leave all pathname manipulation to os.path
(e.g., split and join).
CLASSES
builtins.Exception(builtins.BaseException)
builtins.OSError
-- More --
使用dir方法查看帮助文档¶
不需要导入模块的:
>>> dir(list)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
导入模块的:
>>> import os
>>> dir(os)
['DirEntry', 'F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'PathLike', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_putenv', '_unsetenv', '_wrap_close', 'abc', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'linesep', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sep', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'st', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sys', 'system', 'terminal_size', 'times', 'times_result', 'truncate', 'umask', 'uname_result', 'unlink', 'urandom', 'utime', 'waitpid', 'walk', 'write']
使用pydoc方法查看帮助文档¶
进入到pydoc.help()的help帮助命令行查看帮助文档:
>>> import pydoc
>>> pydoc.help()
Welcome to Python 3.6's help utility!
If this is your first time using Python, you should definitely check out
the tutorial on the Internet at http://docs.python.org/3.6/tutorial/.
Enter the name of any module, keyword, or topic to get help on writing
Python programs and using Python modules. To quit this help utility and
return to the interpreter, just type "quit".
To get a list of available modules, keywords, symbols, or topics, type
"modules", "keywords", "symbols", or "topics". Each module also comes
with a one-line summary of what it does; to list the modules whose name
or summary contain a given string such as "spam", type "modules spam".
help> list
Help on class list in module builtins:
class list(object)
| list() -> new empty list
| list(iterable) -> new list initialized from iterable's items
|
| Methods defined here:
|
| __add__(self, value, /)
| Return self+value.
|
| __contains__(self, key, /)
| Return key in self.
|
| __delitem__(self, key, /)
| Delete self[key].
|
| __eq__(self, value, /)
| Return self==value.
|
| __ge__(self, value, /)
| Return self>=value.
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __getitem__(...)
| x.__getitem__(y) <==> x[y]
|
| __gt__(self, value, /)
| Return self>value.
|
| __iadd__(self, value, /)
| Implement self+=value.
|
| __imul__(self, value, /)
| Implement self*=value.
|
| __init__(self, /, *args, **kwargs)
| Initialize self. See help(type(self)) for accurate signature.
|
| __iter__(self, /)
| Implement iter(self).
|
| __le__(self, value, /)
| Return self<=value.
|
| __len__(self, /)
| Return len(self).
|
| __lt__(self, value, /)
| Return self<value.
|
| __mul__(self, value, /)
| Return self*value.n
|
| __ne__(self, value, /)
| Return self!=value.
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| __repr__(self, /)
| Return repr(self).
|
| __reversed__(...)
| L.__reversed__() -- return a reverse iterator over the list
|
| __rmul__(self, value, /)
| Return self*value.
|
| __setitem__(self, key, value, /)
| Set self[key] to value.
|
| __sizeof__(...)
| L.__sizeof__() -- size of L in memory, in bytes
|
| append(...)
| L.append(object) -> None -- append object to end
|
| clear(...)
| L.clear() -> None -- remove all items from L
|
| copy(...)
| L.copy() -> list -- a shallow copy of L
|
| count(...)
| L.count(value) -> integer -- return number of occurrences of value
|
| extend(...)
| L.extend(iterable) -> None -- extend list by appending elements from the iterable
|
| index(...)
| L.index(value, [start, [stop]]) -> integer -- return first index of value.
| Raises ValueError if the value is not present.
|
| insert(...)
| L.insert(index, object) -- insert object before index
|
| pop(...)
| L.pop([index]) -> item -- remove and return item at index (default last).
| Raises IndexError if list is empty or index is out of range.
|
| remove(...)
| L.remove(value) -> None -- remove first occurrence of value.
| Raises ValueError if the value is not present.
|
| reverse(...)
| L.reverse() -- reverse *IN PLACE*
|
| sort(...)
| L.sort(key=None, reverse=False) -> None -- stable sort *IN PLACE*
|
| ----------------------------------------------------------------------
| Data and other attributes defined here:
|
| __hash__ = None
help> re
Help on module re:
NAME
re - Support for regular expressions (RE).
DESCRIPTION
This module provides regular expression matching operations similar to
those found in Perl. It supports both 8-bit and Unicode strings; both
the pattern and the strings being processed can contain null bytes and
characters outside the US ASCII range.
Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like "A", "a", or "0", are the simplest
regular expressions; they simply match themselves. You can
concatenate ordinary characters, so last matches the string 'last'.
The special characters are:
"." Matches any character except a newline.
"^" Matches the start of the string.
"$" Matches the end of the string or just before the newline at
the end of the string.
"*" Matches 0 or more (greedy) repetitions of the preceding RE.
Greedy means that it will match as many repetitions as possible.
"+" Matches 1 or more (greedy) repetitions of the preceding RE.
"?" Matches 0 or 1 (greedy) of the preceding RE.
*?,+?,?? Non-greedy versions of the previous three special characters.
{m,n} Matches from m to n repetitions of the preceding RE.
{m,n}? Non-greedy version of the above.
"\\" Either escapes special characters or signals a special sequence.
-- More --
使用ipython环境查看帮助文档¶
安装ipython:
pip install ipython
进入到ipython环境,使用help或?查看帮助文档:
$ ipython
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import os
In [2]: help(os)
Help on module os:
NAME
os - OS routines for NT or Posix depending on what system we're on.
DESCRIPTION
This exports:
- all functions from posix or nt, e.g. unlink, stat, etc.
- os.path is either posixpath or ntpath
- os.name is either 'posix' or 'nt'
- os.curdir is a string representing the current directory (always '.')
- os.pardir is a string representing the parent directory (always '..')
- os.sep is the (or a most common) pathname separator ('/' or '\\')
- os.extsep is the extension separator (always '.')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
- os.defpath is the default search path for executables
- os.devnull is the file path of the null device ('/dev/null', etc.)
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then
only use functions that are defined by all platforms (e.g., unlink
and opendir), and leave all pathname manipulation to os.path
(e.g., split and join).
In [3]: os?
Type: module
String form: <module 'os' from 'd:\\program files (x86)\\python3.6.2\\lib\\os.py'>
File: d:\program files (x86)\python3.6.2\lib\os.py
Docstring:
OS routines for NT or Posix depending on what system we're on.
This exports:
- all functions from posix or nt, e.g. unlink, stat, etc.
- os.path is either posixpath or ntpath
- os.name is either 'posix' or 'nt'
- os.curdir is a string representing the current directory (always '.')
- os.pardir is a string representing the parent directory (always '..')
- os.sep is the (or a most common) pathname separator ('/' or '\\')
- os.extsep is the extension separator (always '.')
- os.altsep is the alternate pathname separator (None or '/')
- os.pathsep is the component separator used in $PATH etc
- os.linesep is the line separator in text files ('\r' or '\n' or '\r\n')
- os.defpath is the default search path for executables
- os.devnull is the file path of the null device ('/dev/null', etc.)
Programs that import and use 'os' stand a better chance of being
portable between different platforms. Of course, they must then
only use functions that are defined by all platforms (e.g., unlink
and opendir), and leave all pathname manipulation to os.path
(e.g., split and join).
In [4]:
将帮助文档导出到文件¶
help2file.py代码如下:
import sys
import pydoc
def output_help_to_file(filepath, request):
f = open(filepath, 'w')
sys.stdout = f
pydoc.help(request)
f.close()
sys.stdout = sys.__stdout__
return
if __name__ == '__main__':
# 导出re的help文档
output_help_to_file('re.txt', 're')
查看re.txt文档更详细的帮助内容:
Help on module re:
NAME
re - Support for regular expressions (RE).
DESCRIPTION
This module provides regular expression matching operations similar to
those found in Perl. It supports both 8-bit and Unicode strings; both
the pattern and the strings being processed can contain null bytes and
characters outside the US ASCII range.
Regular expressions can contain both special and ordinary characters.
Most ordinary characters, like "A", "a", or "0", are the simplest
regular expressions; they simply match themselves. You can
concatenate ordinary characters, so last matches the string 'last'.
The special characters are:
"." Matches any character except a newline.
"^" Matches the start of the string.
"$" Matches the end of the string or just before the newline at
the end of the string.
"*" Matches 0 or more (greedy) repetitions of the preceding RE.
Greedy means that it will match as many repetitions as possible.
"+" Matches 1 or more (greedy) repetitions of the preceding RE.
"?" Matches 0 or 1 (greedy) of the preceding RE.
*?,+?,?? Non-greedy versions of the previous three special characters.
{m,n} Matches from m to n repetitions of the preceding RE.
{m,n}? Non-greedy version of the above.
"\\" Either escapes special characters or signals a special sequence.
[] Indicates a set of characters.
A "^" as the first character indicates a complementing set.
"|" A|B, creates an RE that will match either A or B.
(...) Matches the RE inside the parentheses.
The contents can be retrieved or matched later in the string.
(?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).
(?:...) Non-grouping version of regular parentheses.
(?P<name>...) The substring matched by the group is accessible by name.
(?P=name) Matches the text matched earlier by the group named name.
(?#...) A comment; ignored.
(?=...) Matches if ... matches next, but doesn't consume the string.
(?!...) Matches if ... doesn't match next.
(?<=...) Matches if preceded by ... (must be fixed length).
(?<!...) Matches if not preceded by ... (must be fixed length).
(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,
the (optional) no pattern otherwise.
The special sequences consist of "\\" and a character from the list
below. If the ordinary character is not on the list, then the
resulting RE will match the second character.
\number Matches the contents of the group of the same number.
\A Matches only at the start of the string.
\Z Matches only at the end of the string.
\b Matches the empty string, but only at the start or end of a word.
\B Matches the empty string, but not at the start or end of a word.
\d Matches any decimal digit; equivalent to the set [0-9] in
bytes patterns or string patterns with the ASCII flag.
In string patterns without the ASCII flag, it will match the whole
range of Unicode digits.
\D Matches any non-digit character; equivalent to [^\d].
\s Matches any whitespace character; equivalent to [ \t\n\r\f\v] in
bytes patterns or string patterns with the ASCII flag.
In string patterns without the ASCII flag, it will match the whole
range of Unicode whitespace characters.
\S Matches any non-whitespace character; equivalent to [^\s].
\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]
in bytes patterns or string patterns with the ASCII flag.
In string patterns without the ASCII flag, it will match the
range of Unicode alphanumeric characters (letters plus digits
plus underscore).
With LOCALE, it will match the set [0-9_] plus characters defined
as letters for the current locale.
\W Matches the complement of \w.
\\ Matches a literal backslash.
This module exports the following functions:
match Match a regular expression pattern to the beginning of a string.
fullmatch Match a regular expression pattern to all of a string.
search Search a string for the presence of a pattern.
sub Substitute occurrences of a pattern found in a string.
subn Same as sub, but also return the number of substitutions made.
split Split a string by the occurrences of a pattern.
findall Find all occurrences of a pattern in a string.
finditer Return an iterator yielding a match object for each match.
compile Compile a pattern into a RegexObject.
purge Clear the regular expression cache.
escape Backslash all non-alphanumerics in a string.
Some of the functions in this module takes flags as optional parameters:
A ASCII For string patterns, make \w, \W, \b, \B, \d, \D
match the corresponding ASCII character categories
(rather than the whole Unicode categories, which is the
default).
For bytes patterns, this flag is the only available
behaviour and needn't be specified.
I IGNORECASE Perform case-insensitive matching.
L LOCALE Make \w, \W, \b, \B, dependent on the current locale.
M MULTILINE "^" matches the beginning of lines (after a newline)
as well as the string.
"$" matches the end of lines (before a newline) as well
as the end of the string.
S DOTALL "." matches any character at all, including the newline.
X VERBOSE Ignore whitespace and comments for nicer looking RE's.
U UNICODE For compatibility only. Ignored for string patterns (it
is the default), and forbidden for bytes patterns.
This module also defines an exception 'error'.
CLASSES
builtins.Exception(builtins.BaseException)
sre_constants.error
class error(builtins.Exception)
| Exception raised for invalid regular expressions.
|
| Attributes:
|
| msg: The unformatted error message
| pattern: The regular expression pattern
| pos: The index in the pattern where compilation failed (may be None)
| lineno: The line corresponding to pos (may be None)
| colno: The column corresponding to pos (may be None)
|
| Method resolution order:
| error
| builtins.Exception
| builtins.BaseException
| builtins.object
|
| Methods defined here:
|
| __init__(self, msg, pattern=None, pos=None)
| Initialize self. See help(type(self)) for accurate signature.
|
| ----------------------------------------------------------------------
| Data descriptors defined here:
|
| __weakref__
| list of weak references to the object (if defined)
|
| ----------------------------------------------------------------------
| Methods inherited from builtins.Exception:
|
| __new__(*args, **kwargs) from builtins.type
| Create and return a new object. See help(type) for accurate signature.
|
| ----------------------------------------------------------------------
| Methods inherited from builtins.BaseException:
|
| __delattr__(self, name, /)
| Implement delattr(self, name).
|
| __getattribute__(self, name, /)
| Return getattr(self, name).
|
| __reduce__(...)
| helper for pickle
|
| __repr__(self, /)
| Return repr(self).
|
| __setattr__(self, name, value, /)
| Implement setattr(self, name, value).
|
| __setstate__(...)
|
| __str__(self, /)
| Return str(self).
|
| with_traceback(...)
| Exception.with_traceback(tb) --
| set self.__traceback__ to tb and return self.
|
| ----------------------------------------------------------------------
| Data descriptors inherited from builtins.BaseException:
|
| __cause__
| exception cause
|
| __context__
| exception context
|
| __dict__
|
| __suppress_context__
|
| __traceback__
|
| args
FUNCTIONS
compile(pattern, flags=0)
Compile a regular expression pattern, returning a pattern object.
escape(pattern)
Escape all the characters in pattern except ASCII letters, numbers and '_'.
findall(pattern, string, flags=0)
Return a list of all non-overlapping matches in the string.
If one or more capturing groups are present in the pattern, return
a list of groups; this will be a list of tuples if the pattern
has more than one group.
Empty matches are included in the result.
finditer(pattern, string, flags=0)
Return an iterator over all non-overlapping matches in the
string. For each match, the iterator returns a match object.
Empty matches are included in the result.
fullmatch(pattern, string, flags=0)
Try to apply the pattern to all of the string, returning
a match object, or None if no match was found.
match(pattern, string, flags=0)
Try to apply the pattern at the start of the string, returning
a match object, or None if no match was found.
purge()
Clear the regular expression caches
search(pattern, string, flags=0)
Scan through string looking for a match to the pattern, returning
a match object, or None if no match was found.
split(pattern, string, maxsplit=0, flags=0)
Split the source string by the occurrences of the pattern,
returning a list containing the resulting substrings. If
capturing parentheses are used in pattern, then the text of all
groups in the pattern are also returned as part of the resulting
list. If maxsplit is nonzero, at most maxsplit splits occur,
and the remainder of the string is returned as the final element
of the list.
sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used.
subn(pattern, repl, string, count=0, flags=0)
Return a 2-tuple containing (new_string, number).
new_string is the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in the source
string by the replacement repl. number is the number of
substitutions that were made. repl can be either a string or a
callable; if a string, backslash escapes in it are processed.
If it is a callable, it's passed the match object and must
return a replacement string to be used.
template(pattern, flags=0)
Compile a template pattern, returning a pattern object
DATA
A = <RegexFlag.ASCII: 256>
ASCII = <RegexFlag.ASCII: 256>
DOTALL = <RegexFlag.DOTALL: 16>
I = <RegexFlag.IGNORECASE: 2>
IGNORECASE = <RegexFlag.IGNORECASE: 2>
L = <RegexFlag.LOCALE: 4>
LOCALE = <RegexFlag.LOCALE: 4>
M = <RegexFlag.MULTILINE: 8>
MULTILINE = <RegexFlag.MULTILINE: 8>
S = <RegexFlag.DOTALL: 16>
U = <RegexFlag.UNICODE: 32>
UNICODE = <RegexFlag.UNICODE: 32>
VERBOSE = <RegexFlag.VERBOSE: 64>
X = <RegexFlag.VERBOSE: 64>
__all__ = ['match', 'fullmatch', 'search', 'sub', 'subn', 'split', 'fi...
VERSION
2.2.1
FILE
d:\program files (x86)\python3.6.2\lib\re.py
将help2file.py文件复制到python的安装目录D:\Program Files (x86)\python3.6.2\Lib下,再在其他位置导入help2file模板:
D:\tmp>python
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import help2file
>>> help2file.output_help_to_file('dict.txt','dict')
在D:\tmp目录下新生成了文件dict.txt,打开可以详细查看字典dict的帮助说明。
引号与转义符的使用¶
概述¶
在python中使用引号包裹字符串或doc说明文档。
- 单引号 ‘
- 使用单引号指示字符串,所有的空白都会原样保留
- 双引号 “
- 在双引号中的字符串与单引号的使用相同
- 三引号 ‘’’ 或 “”“
- 利用三引号可以指示一个多行的字符串,在三引号中可以自由的使用单引号或双引号
注意:以上引号在页面中显示不正常,以下面的代码显示为准。
引号的使用¶
>>> 'Quote me on this'
'Quote me on this'
>>> "Quote me on this"
'Quote me on this'
>>> '''Quote me on this
... the second line
... the third line
... '''
'Quote me on this\nthe second line\nthe third line\n'
>>> """Quote me on this
... the second line
... the third line
... """
'Quote me on this\nthe second line\nthe third line\n'
>>> print('Quote me on this\nthe second line\nthe third line\n')
Quote me on this
the second line
the third line
>>> print('Quote me on this\nthe second line\nthe third line\n')
Quote me on this
the second line
the third line
转义符¶
使用反斜杠\作为转义符
” # 表示双引号
\ # 表示反斜杠
n # 表示换行符
\ # 行末单独一个反斜杠并不表示转义,而是字符串在下一行继续,并不是开始新的一行。 不使用转义符时,有时会发现错误,python不知道字符串从何处开始,何处结束。
>>> print('this's is a line')
File "<stdin>", line 1
print('this's is a line')
^
SyntaxError: invalid syntax
>>> print('this\'s is a line')
this's is a line
>>> print("this's is a line.\n"this is the other line"")
File "<stdin>", line 1
print("this's is a line.\n"this is the other line"")
^
SyntaxError: invalid syntax
>>> print("this's is a line.\n\"this is the other line\"")
this's is a line.
"this is the other line"
>>>
>>> print("this's is a line.\
... 'this is not the second line. this is the first line too'"
... )
this's is a line.'this is not the second line. this is the first line too'
变量标识符的命名规则¶
变量、名字和对象¶
- Python里所有数据——布尔值、整数、浮点数、字符串,甚至大型数据结构、函数以及程序——都是以对象(Object)的形式存在的。这使得Python语言具有很强的统一性。
- 对象的类型决定了可以对它进行的操作。
- 对象的类型也决定了它装着的数据的是允许被修改的变量(可变的)还是不可被修改的常量(不可变的)。
- Python是强类型的(Strongly typed),你永远无法修改一个已有对象的类型,即使它包含的值是可变的。
- 如果你想知道一个对象的类型,可以使用语句:type(thing)。
- 类(class)是对象的定义,在Python中,”类”和”类型”一般不加区分。
标识符命名规则¶
- 标识符的第一个字符必须是字母(a-Z)或者下划线_
- 标识符的其他部分可以是字母(a-Z)或者下划线_或者数字(0-9)组成
- 标识符大小写敏感
示例:
>>> /a='1' # 无效
File "<stdin>", line 1
/a='1'
^
SyntaxError: invalid syntax
>>> ?a='1' # 无效
File "<stdin>", line 1
?a='1'
^
SyntaxError: invalid syntax
>>> 1a=2 # 无效
File "<stdin>", line 1
1a=2
^
SyntaxError: invalid syntax
>>> _a='1' # 有效
>>> _a
'1'
>>> a1='a1' # 有效
>>> a1
'a1'
>>> a_1="a_1" # 有效
>>> a_1
'a_1'
>>> a_a_1="a_a_1" # 有效
>>> a_a_1
'a_a_1'
基数¶
在Python中,整数默认使用十进制(以10为底),除非你在数字前添加前缀,显式地指定使用其他基数(base)。
- 基数指的是在必须进位前可以使用的数字的最大数量。以2为底(二进制)时,可以使用的数字只有1和0,这里的0和十进制的0代表的意义相同,1和十进制的1所代表的意义也相同。然而以2为底时,1与1相加得到的将是10(一个二加0个一)。
在Python中,除十进制外你还可以使用其他三种进制的数字:
- 0b或0B代表二进制(以2为底)
- 0o或0O代表八进制(以8为底)
- 0x或0X代表十六进制(以16为底)
示例:
>>> 10
10
>>> 0b10
2
>>> 0o10
8
>>> 0x10
16
>>> 0xa
10
>>> 0xb
11
>>> 0xc
12
>>> 0xd
13
>>> 0xe
14
>>> 0xf
15
类型转换¶
- int()函数将其他的数据类型转为整型。
- float()函数将其他的数据类型转为浮点型。
- str()函数将其他的数据类型转为字符串。
示例:
>>> int(True)
1
>>> int(False)
0
>>> int(98.6)
98
>>> int(1.1e3)
1100
>>> float(True)
1.0
>>> float(False)
0.0
>>> float(98)
98.0
>>> float(98.6)
98.6
>>> float('1.1e3')
1100.0
>>> str(98.6)
'98.6'
>>> str(1.1e3)
'1100.0'
>>> str(True)
'True'
>>>
Python的运算符¶
常用运算符¶
常用运算符详细介绍见下表:
运算符 | 名称 | 说明 | 示例 |
---|---|---|---|
+ | 加 | 两个对象相加 | >>> 1+2 结果为 3 >>> 'ab'+'cd' 结果为 'abcd'
|
- | 减 | 两个对象相减 | >>> 2-1 结果为 1 >>> 1-2 结果为 -1
|
* | 乘 | 两个数相乘或重复字符 | >>> 3*4 结果为 12 >>> "-"*6 结果为 '------'
|
** | 幂 | x**y 返回x的y次幂 | >>> 3**4 结果为 81 同 pow(3,4) = 81
|
/ | 除以 | x/y x除以y | >>> 4/2 结果为 2.0 >>> 4.0/2 结果为 2.0
|
// | 取整除 | 返回商的整数部分 | >>> 4//2结果为2 >>> 4//3结果为1 >>> 4//3.0结果为1.0 >>> 4//2.0结果为2.0
|
% | 取模 | 返回除法的余数 | >>> 4%2.0结果为0.0 >>> 4%3.0结果为1.0 >>> 4%3结果为1 >>> 4%2结果为0
|
<< | 左移 | x<<y 把x的二进制数值左移y位 | >>> 2<<1结果为4 >>> 2<<2结果为8
|
>> | 右移 | x>>y 把x的二进制数值右移y位 | >>> 16>>1结果为8 >>> 16>>2结果为4 >>> 16>>3结果为2 >>> 16>>4结果为1
|
& | 按位与 | 数的按位与(同时为1则为1,否则为0) | |
| | 按位或 | 数的按位或(只要有1则为1,否则为0) | |
^ | 按位异或 | 数的按位异或(异则为1,同则为0) | |
~ | 按位翻转 | x的按位翻转为-(x+1) |
运算符有优先级,但建议使用圆括号来分组运算符和操作数,使程序尽可能的易读。
按位与、按位或、按位异或、按位翻转¶
下面详细介绍按位与、按位或、按位异或、按位翻转的求值过程。
分别求3与5,3与-5,-3与5,-3与-5的按位与、按位或、按位异或的值。
正数:
3的原码 = 反码 = 补码 [0000 0011]
5的原码 = 反码 = 补码 [0000 0101]
负数:
原码:
-3的原码 [1000 0011]
-5的原码 [1000 0101]
反码:原码符号位不变,其余各位取反
-3的反码 [1111 1100]
-5的反码 [1111 1010]
补码:原码符号位不变,其余各位取反,再加1(即原码的反码+1)
-3的补码 [1111 1101]
-5的补码 [1111 1011]
将补码放在一起,便于观察:
3 [0000 0011]
5 [0000 0101]
-3 [1111 1101]
-5 [1111 1011]
求值:
使用补码进行运算,最后结果还是补码:
按位与 按位或 按位异或
数值3与5 [0000 0001] [0000 0111] [0000 0110]
数值3与-5 [0000 0011] [1111 1011] [1111 1000]
数值-3与5 [0000 0101] [1111 1101] [1111 1000]
数值-3与-5 [1111 1001] [1111 1111] [0000 0110]
再根据补码求原码,先求反码,正数不变,负数符号位不变,其他取反:
按位与 按位或 按位异或
数值3与5 [0000 0001] [0000 0111] [0000 0110]
数值3与-5 [0000 0011] [1000 0100] [1000 0111]
数值-3与5 [0000 0101] [1000 0010] [1000 0111]
数值-3与-5 [1000 0110] [1000 0000] [0111 1001]
再根据补码求原码,正数不变,负数再+1:
按位与 按位或 按位异或
数值3与5 [0000 0001]=1 [0000 0111]=7 [0000 0110]=6 对
数值3与-5 [0000 0011]=3 [1000 0101]=-5 [1000 1000]=-8 对
数值-3与5 [0000 0101]=5 [1000 0011]=-3 [1000 1000]=-8 对
数值-3与-5 [1000 0111]=-7 [1000 0001]=-1 [0000 0110]=6 对
使用python执行命令验证:
按位与:
>>> 3&5
1
>>> 3&-5
3
>>> -3&5
5
>>> -3&-5
-7
按位或:
>>> 3|5
7
>>> 3|-5
-5
>>> -3|5
-3
>>> -3|-5
-1
按位异或:
>>> 3^5
6
>>> 3^-5
-8
>>> -3^5
-8
>>> -3^-5
6
下面对3,5,-3,-5求按位翻转的值。
刚刚已经获取了4个数的补码,如下:
将补码放在一起,便于观察:
3 [0000 0011]
5 [0000 0101]
-3 [1111 1101]
-5 [1111 1011]
按位反转后的补码:
3 [1111 1100]
5 [1111 1010]
-3 [0000 0010]
-5 [0000 0100]
求补码的反码,正数不变,负数符号位不变,其他位取反:
3 [1000 0011]
5 [1000 0101]
-3 [0000 0010]
-5 [0000 0100]
求补码的原码,正数不变,负数符号位不变,+1:
3 [1000 0100]=-4 对
5 [1000 0110]=-6 对
-3 [0000 0010]=2 对
-5 [0000 0100]=4 对
使用python执行命令验证:
>>> ~3
-4
>>> ~5
-6
>>> ~-3
2
>>> ~-5
4
内建数据结构-列表,元组,字典¶
python内建数据结构介绍¶
python内建数据结构包括 列表(list) 、元组(tuple) 和 字典(dict) 。
列表(list)由方括号[]包括起来,如:
list1=['a','b'], list1=[{'name':'a','value':2},{'name':'d','value':1},{'name':'c','value':3},{'name':'b','value':4}]
列表是可变的( mutable ),不能作为字典的key。
列表是是同构数据序列( lists are homogeneous sequences )。
元组(tuple)由圆括号()包括起来,如:
tp1=(1,2) #表示坐标轴上的点x=1,y=2
元组是不可变的( immutable ),可以作为字典的key值;
元组是异构数据结构–即它们的条目具有不同的含义( Tuples are heterogeneous data structures i.e., their entries have different meanings )。
字典(dict)由大括号{}包括起来,如:
>>> dict1={'name':'Mei','lang':'python'} >>> type(dict1) <class 'dict'>
字典由键(key)值(value)对组成,键(key)是唯一的,且不可变的;
键(key)和值(value)中间由冒号:连接;
两个键值对之间由逗号,分割;
字典中键值对是无序的。
列表list的使用¶
列表list用于处理一组有序项目的数据结构,列表中的项目包括在方括号[]中。
常用方法及示例如下:
list() # 创建空列表
# 如list1=list(),则list1=[]
list.append(obj) # 在列表list结尾新增一个对象
# 如list1=['a','b'],list2=['c','d']
# 则执行list1.append(list2)后,list1=['a', 'b', ['c', 'd']]
list.extend(iterable) # 通过添加元素的迭代器扩展列表
# 如list1=['a','b'],list2=['c','d']
# 则执行list1.extend(list2)后,list1=['a', 'b', 'c', 'd']
list.clear() # 将list列表清空,即list变成一个空列表
# 如list1=['a','b'],则执行list1.clear()后,list1=[]
list.copy() # 将list列表复制一份,浅拷贝
# 如list1=['a','b'],L=list1.copy(),则L=['a','b']
list.count(value) # 返回value在列表list中出现的次数
# >>> list1=['a','b','a']
# >>> list1.count('a')
# 2
# >>> list1.count('b')
# 1
list.index(value, [start, [stop]]) # 返回value在列表list中第1次出现的索引号。
# 如果指定start,stop的话,则从start索引处开始查找,到stop索引处结尾(不包括stop索引处)。
# 如list1=['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd', 'e']
# list1.index('a')=0,list1.index('b')=1,list1.index('c')=2,list1.index('d')=3
# list1.index('e')=8
# list1.index('a',0,5)=0,list1.index('a',1,5)=4,
# list1.index('a',2,5)=4,list1.index('a',3,5)=4,
# list1.index('a',4,5)=4,
# list1.index('b',5,6)=5
list.insert(index, object) # 在指定索引值index前插入一个对象object
# 如list1=['a','b'],list2=['c','d'],则执行list1.insert(1,list2)后,list1=['a',['c', 'd'],'b']
# 如list1=['a','b','d'],则执行list1.insert(2,'c')后,list1=['a','b','c','d']
list.pop([index]) # 移除索引号为index的元素,如果不指定index,则默认移除最后一个元素。并返回移除元素的值。
# 如list1=['a','b','c','d'],执行list1.pop()后,list1=['a','b','c']
# 再执行list1.pop(1)后,list1=['a','c']
# 再执行list1.pop(0)后,list1=['c']
# 再执行list1.pop(-1)后,list1=[]
list.remove(value) # 移除list列表中第一次出现的指定元素value。 如:
# >>> list1=['a','b','a','c','b','d']
# >>> list1.remove('a')
# >>> list1
# ['b', 'a', 'c', 'b', 'd']
# >>> list1.remove('b')
# >>> list1
# ['a', 'c', 'b', 'd']
list.reverse() # 将列表前后反转,如:
# >>> list1
# ['a', 'c', 'b', 'd']
# >>> list1.reverse()
# >>> list1
# ['d', 'b', 'c', 'a']
list.sort(key=None, reverse=False) # 对list列表进行排序,可设定排序的关键字key,或指定是否需要反转reverse。
# reverse=False 表示不需要反转,即升序排序;
# reverse=True 表示需要反转,即降序排序。如:
# >>> list1=['d','b','c','a','e']
# >>> list1
# ['d', 'b', 'c', 'a', 'e']
# >>> list1.sort() # 不带参数进行排序
# >>> list1
# ['a', 'b', 'c', 'd', 'e']
# >>> list1=['d','b','c','a','e']
# >>> list1.sort(reverse=False) # 带参数进行升序排序
# >>> list1
# ['a', 'b', 'c', 'd', 'e']
# >>> list1=['d','b','c','a','e'] # 带参数进行降序排序
# >>> list1.sort(reverse=True)
# >>> list1
# ['e', 'd', 'c', 'b', 'a']
# 通过关键字key进行排序
>>> list1=[{'name':'a','value':2},{'name':'d','value':1},{'name':'c','value':3},{'name':'b','value':4}]
>>> list1
[{'name': 'a', 'value': 2}, {'name': 'd', 'value': 1}, {'name': 'c', 'value': 3}, {'name': 'b', 'value': 4}]
# 对第一个关键字name进行升序排序
>>> list1.sort(key=lambda obj:obj.get('name'),reverse=False)
>>> list1
[{'name': 'a', 'value': 2}, {'name': 'b', 'value': 4}, {'name': 'c', 'value': 3}, {'name': 'd', 'value': 1}]
# 对第二个关键字value进行升序排序
>>> list1.sort(key=lambda obj:obj.get('value'),reverse=False)
>>> list1
[{'name': 'd', 'value': 1}, {'name': 'a', 'value': 2}, {'name': 'c', 'value': 3}, {'name': 'b', 'value': 4}]
>>> list1.sort(key=operator.itemgetter(1),reverse=False)
# 通过operator.itemgetter('name')获取name所在的维度,再进度升序排序
>>> list1.sort(key=operator.itemgetter('name'))
>>> list1
[{'name': 'a', 'value': 2}, {'name': 'b', 'value': 4}, {'name': 'c', 'value': 3}, {'name': 'd', 'value': 1}]
# 通过operator.itemgetter('name')获取name所在的维度,再进度降序排序
>>> list1.sort(key=operator.itemgetter('name'),reverse=True)
>>> list1
[{'name': 'd', 'value': 1}, {'name': 'c', 'value': 3}, {'name': 'b', 'value': 4}, {'name': 'a', 'value': 2}]
# 通过operator.itemgetter('value')获取value所在的维度,再进度升序排序
>>> list1.sort(key=operator.itemgetter('value'))
>>> list1
[{'name': 'd', 'value': 1}, {'name': 'a', 'value': 2}, {'name': 'c', 'value': 3}, {'name': 'b', 'value': 4}]
# 通过operator.itemgetter('value')获取value所在的维度,再进度降序排序
>>> list1.sort(key=operator.itemgetter('value'),reverse=True)
>>> list1
[{'name': 'b', 'value': 4}, {'name': 'c', 'value': 3}, {'name': 'a', 'value': 2}, {'name': 'd', 'value': 1}]
list.__len__() # 返回list列表的长度
# 如:list1=['a','b'],则list1.__len__() = 2
list.__add__(list1) # 将两个list拼接在一起
# 如: list1=['a','b'],list2=['c','d'],则list1.__add__(list2)=['a', 'b', 'c', 'd']
# list1不变,list1=['a','b']
list.__contains__(value) # list列表中是否包含value值
# >>> list1.__contains__('c')
# False
# >>> list1.__contains__('a')
# True
list.__delitem__(index) # 删除索引index处的元素,与list.pop(index)作用相同
>>> list1=[{'name':'a','value':2},{'name':'d','value':1},{'name':'c','value':3},{'name':'b','value':4}]
>>> list1.__delitem__(1)
>>> list1
[{'name': 'a', 'value': 2}, {'name': 'c', 'value': 3}, {'name': 'b', 'value': 4}]
list.__getitem__(index) 获取索引号index对应的元素值
>>> list1.__getitem__(1)
{'name': 'c', 'value': 3}
list.__imul__(int_value) 将列表list重复int_value次,如果int_value=0,则清空列表
如:
>>> list1=['a','b']
>>> list1.__imul__(2)
['a', 'b', 'a', 'b']
>>> list1
['a', 'b', 'a', 'b']
>>> list1.__imul__(0)
[]
>>> list1
[]
获取特殊的list列表:
>>> squares = list(map(lambda x: x**2, range(10)))
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> a=[1,2,3,4]
>>> b=[i**2 for i in a ]
>>> b
[1, 4, 9, 16]
使用list()将其他数据类型转换成列表
>>> list('cat')
['c', 'a', 't']
>>> list(('ab','cd','ef'))
['ab', 'cd', 'ef']
>>>
元组(tuple)的使用¶
元姐通过圆括号()中的逗号将元素分割开。如my_location = (42, 11) # page number, line number
常用方法及示例如下:
tuple.count(value) 返回value值在元组tuple中出现的次数
如:
>>> tu1=(1,2,3,4,5,1,2,1,2,3)
>>> tu1
(1, 2, 3, 4, 5, 1, 2, 1, 2, 3)
>>> tu1.count(1)
3
>>> tu1.count(2)
3
>>> tu1.count(3)
2
>>> tu1.count(4)
1
>>> tu1.count(5)
1
>>> tu1.count(0)
0
tuple.index(value,[start, [stop]]) 返回value在tuple元素中第一次出现的索引号。
>>> tu1.index(1,0,8)
0
>>> tu1.index(1,1,8)
5
>>> tu1.index(1,5,8)
5
>>> tu1.index(1,6,8)
7
>>> tu1.index(3)
2
tuple.__add__(other_tuple) 将一个元组与另外一个元组组合起来,tuple,other_tuple保持不变
>>> tu1
(1, 2, 3)
>>> tu1.__add__(tu1)
(1, 2, 3, 1, 2, 3)
>>> tu1
(1, 2, 3)
>>> tu2=(4,5)
>>> tu2
(4, 5)
>>> tu1.__add__(tu2)
(1, 2, 3, 4, 5)
>>> tu1
(1, 2, 3)
>>> tu2
(4, 5)
tuple.__contains__(value) 元组tuple中是否包含值为value的元素,返回True或False
>>> tu1=('a','b','c')
>>> tu1.__contains__('a')
True
>>> tu1.__contains__('b')
True
>>> tu1.__contains__('c')
True
>>> tu1.__contains__('d')
False
tuple.__eq__(other_tuple) 元组tuple与元组other_tuple是否相等,返回True或False
>>> tu1
('a', 'b', 'c')
>>> tu2
('a', 'b', 'c')
>>> tu3
(1, 2, 3, 4)
>>> tu1.__eq__(tu2)
True
>>> tu1.__eq__(tu3)
False
tuple.__getitem__(index) 获取元组tuple中索引号为index的元素
>>> tu1
('a', 'b', 'c')
>>> tu1.__getitem__(0)
'a'
>>> tu1.__getitem__(1)
'b'
>>> tu1.__getitem__(2)
'c'
>>> tu1.__getitem__(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: tuple index out of range
tuple.__len__() 返回元组tuple的长度
>>> tu1
('a', 'b', 'c')
>>> tu1.__len__()
3
>>> tu2
('a', 'b', 'c')
>>> tu2.__len__()
3
>>> tu3
(1, 2, 3, 4)
>>> tu3.__len__()
4
tuple.__mul__(n) 重复元组tuple n次
>>> tu1
('a', 'b', 'c')
>>> tu1.__mul__(2)
('a', 'b', 'c', 'a', 'b', 'c')
>>> tu1.__mul__(3)
('a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c')
>>> tu1
('a', 'b', 'c')
单元素的元组tuple 如果元组tuple中仅包含一个元素,则需要在元素后面跟一个逗号,
>>> tu1=(1,)
>>> tu1
(1,)
>>> type(tu1)
<class 'tuple'>
>>> tu2=(1)
>>> tu2
1
>>> type(tu2)
<class 'int'>
元组tuple的打印输出 通过%s和%来定制输出语句中的变量
如果有多个参数需要输出时,使用下面这种组合成元组的方式更加方便。
>>> name='mei'
>>> lang='python'
>>> print('hi,%s,you love to learn the language %s' % (name,lang))
hi,mei,you love to learn the language python
字典(dict)的使用¶
- 字典(dict)由大括号{}包括起来,如: dict1={‘name’:’Mei’,’lang’:’python’}。
- 字典与列表类似,但其中元素的顺序无关紧要,因为它们不能通过像0或1的偏移量访问。每个元素拥有与之对应的互不相同的键(key),需要通过键来访问元素。
- 键通常是字符串,但它还可以是Python中其他任意的不可变类型:布尔型,整型,浮点型,元组等。
常用方法及示例如下:
dict.get(key) 获取字典dict中键为key的值value
>>> dict1
{'name': 'Mei', 'lang': 'python'}
>>> dict1.get('name')
'Mei'
>>> dict1.get('lang')
'python'
dict.items() 返回字典dict的(key,value)元组对的列表的对象,可供用户去迭代访问所有的key和value
>>> dict1.items()
dict_items([('name', 'Mei'), ('lang', 'python')])
>>> for x,y in dict1.items():
... print('key is',x,',value is',y)
...
key is name ,value is Mei
key is lang ,value is python
dict.keys() 返回字典dict的key组成的列表的对象,可供用户去迭代访问所有的key
>>> dict1.keys()
dict_keys(['name', 'lang'])
>>> for x in dict1.keys():
... print('key is',x)
...
key is name
key is lang
dict.values() 返回字典dict的键值value组成的列表的对象,可供用户去迭代访问所有的value
>>> dict1={'name':'Mei','lang':'python'}
>>> dict1.values()
dict_values(['Mei', 'python'])
dict.pop(key[,returnValue]) 移除字典dict中键为key的键值对,并返回键key所对应的value的值
如果设置了returnValue的话,则当查找不到键key时,才返回returnValue值
>>> dict1={'a':3,'b':1,'c':2}
>>> dict1
{'a': 3, 'b': 1, 'c': 2}
>>> dict1.pop('a')
3
>>> dict1
{'b': 1, 'c': 2}
>>> dict1.pop('b')
1
>>> dict1
{'c': 2}
>>> dict1.pop('b','b is not the key')
'b is not the key'
>>> dict1
{'c': 2}
>>> dict1.pop('c','c is not the key')
2
>>> dict1
{}
dict.popitem() 移除字典dict中最后一个键值对,并返回被移除的键值对的值;
当dict字典为空时,使用popitem()方法会报错
>>> dict1={'a':3,'b':1,'c':2,'d':5,'e':4}
>>> dict1
{'a': 3, 'b': 1, 'c': 2, 'd': 5, 'e': 4}
>>> dict1.popitem()
('e', 4)
>>> dict1
{'a': 3, 'b': 1, 'c': 2, 'd': 5}
>>> dict1.popitem()
('d', 5)
>>> dict1
{'a': 3, 'b': 1, 'c': 2}
>>> dict1.popitem()
('c', 2)
>>> dict1
{'a': 3, 'b': 1}
>>> dict1.popitem()
('b', 1)
>>> dict1
{'a': 3}
>>> dict1.popitem()
('a', 3)
>>> dict1
{}
>>> dict1.popitem()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'popitem(): dictionary is empty'
dict.setdefault(key[,set_value]) 获取dict字典键key对应的value字
当key不存在时,若未指定set_value,则添加键值对key:None
当key不存在时,若指定set_value,则添加键值对key:set_value,并返回set_value
>>> dict1={'a':1,'b':2}
>>> dict1
{'a': 1, 'b': 2}
>>> dict1.setdefault('a')
1
>>> dict1.setdefault('a',3)
1
>>> dict1
{'a': 1, 'b': 2}
>>> dict1.setdefault('c')
>>> dict1
{'a': 1, 'b': 2, 'c': None}
>>> dict1.setdefault('c')
>>> dict1
{'a': 1, 'b': 2, 'c': None}
>>> dict1.setdefault('d','add_by_sedefault')
'add_by_sedefault'
>>> dict1
{'a': 1, 'b': 2, 'c': None, 'd': 'add_by_sedefault'}
>>> dict1.setdefault('b','b is not the key')
2
>>> dict1
{'a': 1, 'b': 2, 'c': None, 'd': 'add_by_sedefault'}
dict1.update(dict2) 按dict2更新dict1
如果dict2中的key值在dict1中存在,则将dict2中key对应的值赋值给dict1[key],即dict1[key]=dict2[key];
如果dict2中的key值在dict1中不存在,dict2[key]=value,则将dict2中key对应的键值对添加到字典dict1中,即dict1[key]=dict2[key];
>>> dict1={'a':1,'b':4,'c':2,'d':3,'f':5}
>>> dict1
{'a': 1, 'b': 4, 'c': 2, 'd': 3, 'f': 5}
>>> dict2={'a':5,'f':1}
>>> dict2
{'a': 5, 'f': 1}
>>> dict3={'d':2}
>>> dict3
{'d': 2}
>>> dict4={'b':'four','c':3}
>>> dict4
{'b': 'four', 'c': 3}
>>> dict1.update(dict2)
>>> dict1
{'a': 5, 'b': 4, 'c': 2, 'd': 3, 'f': 1}
>>> dict1.update(dict3)
>>> dict1
{'a': 5, 'b': 4, 'c': 2, 'd': 2, 'f': 1}
>>> dict1.update(dict3)
>>> dict1
{'a': 5, 'b': 4, 'c': 2, 'd': 2, 'f': 1}
>>> dict1.update(dict4)
>>> dict1
{'a': 5, 'b': 'four', 'c': 3, 'd': 2, 'f': 1}
>>> dict5={'g':6,'h':7}
>>> dict5
{'g': 6, 'h': 7}
>>> dict1.update(dict5)
>>> dict1
{'a': 5, 'b': 'four', 'c': 3, 'd': 2, 'f': 1, 'g': 6, 'h': 7}
>>> dict6={'b':4,'i':8}
>>> dict6
{'b': 4, 'i': 8}
>>> dict1
{'a': 5, 'b': 4, 'c': 3, 'd': 2, 'f': 1, 'g': 6, 'h': 7, 'i': 8}
>>> dict1.update({'j':9})
>>> dict1
{'a': 5, 'b': 4, 'c': 3, 'd': 2, 'f': 1, 'g': 6, 'h': 7, 'i': 8, 'j': 9}
dict.fromkeys(iterable, value=None) 生成一个新的字典
可迭代对象iterable可以是字符串、元组、列表或字典,用于创建字典的键key;
字典所有键key对应的同一值的初始值为value,用户不输入value值时,默认以None。
>>> dict1={'a':1,'b':2,'c':3}
>>> dict1
{'a': 1, 'b': 2, 'c': 3}
>>> dict1.fromkeys('123')
{'1': None, '2': None, '3': None}
>>> dict1.fromkeys('123','string字符串')
{'1': 'string字符串', '2': 'string字符串', '3': 'string字符串'}
>>> dict1.fromkeys((1,2,3),'string字符串')
{1: 'string字符串', 2: 'string字符串', 3: 'string字符串'}
>>> dict1.fromkeys((1,2,3),'tuple元组')
{1: 'tuple元组', 2: 'tuple元组', 3: 'tuple元组'}
>>> dict1.fromkeys(('1','2','3'),'tuple元组')
{'1': 'tuple元组', '2': 'tuple元组', '3': 'tuple元组'}
>>> dict1.fromkeys(['1','2','3'],'list列表')
{'1': 'list列表', '2': 'list列表', '3': 'list列表'}
>>> dict1.fromkeys([1,2,3],'list列表')
{1: 'list列表', 2: 'list列表', 3: 'list列表'}
>>> dict1.fromkeys({1:'a',2:'b',3:'c'},'dict字典')
{1: 'dict字典', 2: 'dict字典', 3: 'dict字典'}
>>> dict1.fromkeys({'1':'a','2':'b','3':'c'},'dict字典')
{'1': 'dict字典', '2': 'dict字典', '3': 'dict字典'}
dict.copy() 字典的浅拷贝,等同于copy模块中的copy()方法,进行浅拷贝
字典浅拷贝:深拷贝父对象(一级目录),子对象(二级目录)不拷贝,还是进行引用
copy模块中的deepcopy()方法为深拷贝,父对象和子对象同时会被拷贝。
>>> dict1={'a':1,'b':(1,2),'c':[3,['a','b'],5]}
>>> dict2=dict1 # 浅拷贝,仅引用对象
>>> dict3=dict1.copy() # 字典浅拷贝
>>> dict4=copy.copy(dict1) # copy模块浅拷贝
>>> dict5=copy.deepcopy(dict1) # copy模块深拷贝
>>> dict1
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> dict2
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> dict4
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> dict3
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> dict5
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> dict1['c']
[3, ['a', 'b'], 5]
>>> dict1['c'][1]
['a', 'b']
>>> dict1['c'][1].remove('b')
>>> dict1
{'a': 1, 'b': (1, 2), 'c': [3, ['a'], 5]}
>>> dict2
{'a': 1, 'b': (1, 2), 'c': [3, ['a'], 5]}
>>> dict3
{'a': 1, 'b': (1, 2), 'c': [3, ['a'], 5]}
>>> dict4
{'a': 1, 'b': (1, 2), 'c': [3, ['a'], 5]}
>>> dict5
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> dict1.pop('c')
[3, ['a'], 5]
>>> dict1
{'a': 1, 'b': (1, 2)}
>>> dict2
{'a': 1, 'b': (1, 2)}
>>> dict3
{'a': 1, 'b': (1, 2), 'c': [3, ['a'], 5]}
>>> dict4
{'a': 1, 'b': (1, 2), 'c': [3, ['a'], 5]}
>>> dict5
{'a': 1, 'b': (1, 2), 'c': [3, ['a', 'b'], 5]}
>>> adict={'姓名':'zhang','性别':['男','女']}
>>> adict
{'姓名': 'zhang', '性别': ['男', '女']}
>>> bdict=adict
>>> cdict=adict.copy()
>>> import copy
>>> ddict=copy.copy(adict)
>>> edict=copy.deepcopy(adict)
>>> adict
{'姓名': 'zhang', '性别': ['男', '女']}
>>> bdict
{'姓名': 'zhang', '性别': ['男', '女']}
>>> cdict
{'姓名': 'zhang', '性别': ['男', '女']}
>>> ddict
{'姓名': 'zhang', '性别': ['男', '女']}
>>> edict
{'姓名': 'zhang', '性别': ['男', '女']}
>>> adict['性别']
['男', '女']
>>> adict['性别'].remove('女')
>>> adict
{'姓名': 'zhang', '性别': ['男']}
>>> bdict
{'姓名': 'zhang', '性别': ['男']}
>>> cdict
{'姓名': 'zhang', '性别': ['男']}
>>> edict
{'姓名': 'zhang', '性别': ['男', '女']}
dict.__setitem__(key,value) 给dict字典的key赋值value,或添加新的key:value键值对
>>> dict1={'a':1,'b':(1,2)}
>>> dict1
{'a': 1, 'b': (1, 2)}
>>> dict1.__setitem__('c',3)
>>> dict1
{'a': 1, 'b': (1, 2), 'c': 3}
>>> dict1.__setitem__('b',None)
>>> dict1
{'a': 1, 'b': None, 'c': 3}
dict.clear() 清空字典
>>> dict1={'a':1,'b':(1,2)}
>>> dict1
{'a': 1, 'b': (1, 2)}
>>> dict1.clear()
>>> dict1
{}
dict(other) 使用dict()将其他类型的双值子系列转换成字典
# 包含双值列表的列表
>>> list1 = [['a',1],['b',2],['c',3]]
>>> list1
[['a', 1], ['b', 2], ['c', 3]]
>>> dict(list1)
{'a': 1, 'b': 2, 'c': 3}
>>>
# 包含双值元组的列表
>>> list2 = [('a','b'),('c','d'),('e','f')]
>>> dict(list2)
{'a': 'b', 'c': 'd', 'e': 'f'}
>>>
# 包含双值列表的元组
>>> tuple1 = (['a','b'],['c','d'],['e','f'])
>>> dict(tuple1)
{'a': 'b', 'c': 'd', 'e': 'f'}
>>>
# 双字符的字符串组成的列表
>>> list1 = ['ab','cd','ef']
>>> dict(list1)
{'a': 'b', 'c': 'd', 'e': 'f'}
>>>
# 双字符的字符串组成的元组
>>> tuple1 = ('ab','cd','ef')
>>> dict(tuple1)
{'a': 'b', 'c': 'd', 'e': 'f'}
>>>
集合(set)的使用¶
集合(set)由大括号{}包括起来,如: set1={‘name’,’lang’}。
集合就像舍弃了值,仅剩下键的字典一样。键与键之间不允许重复。
如果仅仅想知道某一个元素是否存在而不关心其他的,使用集合是个非常好的选择。
常用方法及示例如下:
集合的创建:
方式1:使用set()函数创建一个集合
方式2:使用大括号将一系列以逗号隔开的值包裹起来。
如:
>>> set1=set()
>>> set1
set()
>>> type(set1)
<class 'set'>
>>> set2={'Mon','Tue','Wed','Thu','Fri','Sat','Sun'}
>>> set2
{'Thu', 'Sun', 'Mon', 'Tue', 'Fri', 'Wed', 'Sat'}
>>> set('string')
{'i', 'n', 's', 'g', 't', 'r'}
>>> set(['One','Two','Three'])
{'Three', 'One', 'Two'}
>>> set(('One','Two','Three'))
{'Three', 'One', 'Two'}
>>> set({'name':'mei','lang':'python'})
{'lang', 'name'}
集合运算:
交集&或intersection(),同时出现在两个集合中的元素组成的集合。
>>> set1= set(('One','Two','Three'))
>>> set1
{'Three', 'One', 'Two'}
>>> set2= set(('Two','Three','Four'))
>>> set2
{'Four', 'Three', 'Two'}
>>> set1 & set2
{'Three', 'Two'}
>>> set1.intersection(set2)
{'Three', 'Two'}
并集|或union(),至少出现在一个集合中的元素组成的集合。
>>> set1 | set2
{'Three', 'One', 'Two', 'Four'}
>>> set2 | set1
{'Four', 'Three', 'Two', 'One'}
>>> set1.union(set2)
{'Three', 'One', 'Two', 'Four'}
>>> set2.union(set1)
{'Four', 'Three', 'Two', 'One'}
差集-或difference(),出现在第一个集合但不出现在第二个集合中的元素组成的集合。
>>> set1 - set2
{'One'}
>>> set1.difference(set2)
{'One'}
>>> set2 - set1
{'Four'}
>>> set2.difference(set1)
{'Four'}
异或差^或symmetric_difference(),仅在两个集合中出现一次的元素组成的集合。
>>> set1 ^ set2
{'Four', 'One'}
>>> set2 ^ set1
{'Four', 'One'}
>>> set1.symmetric_difference(set2)
{'Four', 'One'}
>>> set2.symmetric_difference(set1)
{'Four', 'One'}
使用<=或issubset()判断一个集合是否是另一个集合的子集,即第一个集合中所有元素出现在第二个集合中。
>>> set1 <= set2
False
>>> set3={'One','Two','Three','Four','Five'}
>>> set1 <= set3
True
>>> set2 <= set3
True
>>> set1.issubset(set3)
True
>>> set2.issubset(set3)
True
使用>=或issuperset()判断一个集合是否是另一个集合的超集,即第二个集合中所有元素出现在第一个集合中。
>>> set1 >= set3
False
>>> set3 >= set1
True
>>> set3.issuperset(set1)
True
>>> set3.issuperset(set2)
True
>>> set3.issuperset(set3)
True
真子集,第一个集合中所有元素出现在第二个集合中,且第二个集合还有其他元素。
>>> set1 < set3
True
真超集,第二个集合中所有元素出现在第一个集合中,且第一个集合还有其他元素。
>>> set3 > set1
True
推导式¶
列表推导式:
列表推导能非常简洁的构造一个新列表:只用一条简洁的表达式即可对得到的元素进行转换变形
[ expression for item in iterable ]
[ expression for item in iterable if condition ]
>>> [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> [x**2 for x in range(10) if x%2==0]
[0, 4, 16, 36, 64]
>>> [(x,y) for x in range(5) if x%2==0 for y in range(5) if y %2==1]
[(0, 1), (0, 3), (2, 1), (2, 3), (4, 1), (4, 3)]
>>> list1=['x','y','z']
>>> list2=[1,2,3]
>>> list3=[ (i,j) for i in list1 for j in list2 ]
>>> list3
[('x', 1), ('x', 2), ('x', 3), ('y', 1), ('y', 2), ('y', 3), ('z', 1), ('z', 2), ('z', 3)]
>>> [[1 if i == j else 0 for i in range(5)] for j in range(5)]
[[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]]
列表解析式:
>>> func1 = [lambda x:x*i for i in range(10)]
>>> [f1(2) for f1 in func1]
[18, 18, 18, 18, 18, 18, 18, 18, 18, 18]
匿名函数lambda:
lambda的一般形式是关键字lambda后面跟一个或多个参数,紧跟一个冒号,以后是一个表达式。
lambda是一个表达式而不是一个语句。
lambda能够出现在Python语法不允许def出现的地方。
作为表达式,lambda返回一个值(即一个新的函数)。
lambda用来编写简单的函数,而def用来处理更强大的任务。
lambda首要用途是指定短小的回调函数。
>>> add=lambda x,y:x+y
>>> add(1,2)
>>> f=lambda x,y,z:x+pow(y,2)+pow(z,3)
>>> f(1,2,3)
32
条件表达式:
lambda: a if some_condition() else b
>>> f=lambda x: 'big' if x > 100 else 'small'
>>> f(101)
'big'
>>> f(100)
'small'
>>> f(99)
'small'
>>> f1=lambda x:print(x)
>>> f1(1)
1
>>> f1('str')
str
生成器解析式:
>>> func1 = (lambda x:x*i for i in range(10))
>>> [f1(2) for f1 in func1]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
对于生成器, 只有你需要的时候它才会求值, 这也是和列表解析式的区别, 列表解析式只要你运行, 马上就把i变为了9, 可是生成器不会, 当你调用第一个函数的时候, 他把相应的i求出来, 然后停止, 等你下一次调用。
字典推导式:
{key_expression:value_expression for expression in iterable if condition}
>>> {-i:i for i in range(6)}
>>> {0: 0, -1: 1, -2: 2, -3: 3, -4: 4, -5: 5}
集合推导式:
{expression for expression in iterable if condition}
>>> a_set={ num for num in range(10) if num % 2 == 1 }
>>> a_set
{1, 3, 5, 7, 9}
使用zip()并行迭代:
可以使用zip()函数对多个序列进行并行迭代。
>>> Eng=['Mon','Tue','Wed','Thu','Fri','Sat','Sun']
>>> Eng
['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']
>>> Num=[1,2,3,4,5,6,7]
>>> Num
[1, 2, 3, 4, 5, 6, 7]
>>> list(zip(Eng,Num))
[('Mon', 1), ('Tue', 2), ('Wed', 3), ('Thu', 4), ('Fri', 5), ('Sat', 6), ('Sun', 7)]
>>> dict(zip(Eng,Num))
{'Mon': 1, 'Tue': 2, 'Wed': 3, 'Thu': 4, 'Fri': 5, 'Sat': 6, 'Sun': 7}
>>> type(zip(Eng,Num))
<class 'zip'>
Python的控制流¶
python中控制流语句包括if/for/while三种控制流语句。
- 使用#标记注释,从#开始到当前行结束的部分都是注释。
- 一行程序的最大长度建议为80个字符。
- 如果代码太长,可以使用连接符连接。
- 代码块缩进时使用4个空格缩进。
- 避免使用Tab与Space混合使用的缩进风格。
if语法¶
if语句:
语法: if exp1: cmds [elif:] cmds [else:] cmds elif或else从句是可选的。
比较操作符:
相等 == 不等于 != 小于 < 大于 > 不小于 >= 不大于 <= 属于 in
真值(True)与假值(False),下列的情况会被认为是假值False,其他情况认为是True:
布尔 False null类型 None 整型 0 浮点型 0.0 空字符串 '' 空元组 () 空列表 [] 空字典 {} 空集合 set()
for语法¶
for语句:
语法: for <variable> in <sequence>: <statements> else: <statements> else从句是可选的。 示例: >>> for i in range(10): ... print(i) ... 0 1 2 3 4 5 6 7 8 9 >>> for i in 'abcdef': ... print(i) ... else: ... print("哈哈") a b c d e f 哈哈
break语句和continue语句¶
- break语句用来终止循环
- continue语句跳转当前循环块中剩余部分,然后继续下一轮循环。
示例:
>>> while True:
... str = input('please input a string:')
... if str == 'good':
... print('You guess right!')
... break
... else:
... print('You guess wrong!')
... continue
...
please input a string:a
You guess wrong!
please input a string:b
You guess wrong!
please input a string:c
You guess wrong!
please input a string:perfect
You guess wrong!
please input a string:good
You guess right!
使用zip()并行迭代¶
在使用迭代时,可以通过zip()函数对多个序列进行并行迭代
使用zip()函数可以遍历多个序列,在具有相同位移的项之间创建元组
使用zip()配合list()和dict()函数使用:
In [1]: days=['Monday','Tuesday','Wednesday'] In [2]: chinese=['星期一','星期二','星期三'] In [3]: for day,china in zip(days,chinese): ...: print(day,'\t',china) ...: Monday 星期一 Tuesday 星期二 Wednesday 星期三 In [4]: list(zip(days,chinese)) Out[4]: [('Monday', '星期一'), ('Tuesday', '星期二'), ('Wednesday', '星期三')] In [5]: dict(zip(days,chinese)) Out[5]: {'Monday': '星期一', 'Tuesday': '星期二', 'Wednesday': '星期三'}
使用range()生成自然数序列¶
- range()函数返回在特定区间的自然数序列。
- range(start,stop,step),start起始值默认为0,stop是最后一个值,step步长默认值是1。
- range中返回的自然数序列中包含最小值start,不包含最大值stop。
示例:
In [1]: range(6)
Out[1]: range(0, 6)
In [2]: for i in range(6):
...: print(i)
...:
0
1
2
3
4
5
In [3]: for i in range(10,20,2):
...: print(i)
...:
10
12
14
16
18
函数¶
目录
本节全面介绍python中函数的使用。
函数的创建¶
- 函数是重用的程序段。允许给一个语块一个名称,然后在别的位置调用这个函数。
- 代码复用的第一步是使用函数,它是命名的用于区分的代码段。
- 函数通过def关键字定义。
- 函数命名必须使用下划线_或字母开头,仅能有字母,数字和下划线,建议使用小写字母和下划线构建函数名。
- 如果函数不显示调用return函数,那么默认会返回None。
语法如下:
def function_name([arg1,][arg2,][arg3]):
cmds
function_name 为函数名
arg1/arg2/arg3 为形参
在PyCharm中运行:
# 函数定义
def sayhi(name):
print('hi! ',name,'. I am Python. How are you?')
# 调用函数,并传递参数'meizhaohui'
sayhi('meizhaohui')
# 运行结果如下::
# hi! meizhaohui . I am Python. How are you?
局部变量与全局变量¶
- 局部变量
当你在函数内定义变量时,它们与函数外具有相同名称的其他变量没有任何关系,即变量名称对于函数来说是局部的,这称为变量的作用域。
所有变量的作用域是它们被定义的块,从它们的名称被定义的那点开始。
- 全局变量
如果在函数中需要使用全局变量,也就是说这个变量能在函数外可以引用,需要使用global关键字进行定义,函数中的全局变量有以下限制:
* 定义方式为global后面接全局变量名称VAR_NAME(即 global VAR_NAME # 定义方式为global后面接全局变量名称VAR_NAME)
* 全局变量定义时不能在后面赋值
* 全局变量不能做为函数的传递参数,即一个变量不能即做参数也做全局变量
* 建议使用大写字母和下划线构建全局变量名。
如下所示的定义是正确的:
def saylang(lang):
global LOVE_LANG # 定义全局变量LOVE_LANG
如下所示的定义是错误的:
def saylang(lang):
global LOVE_LANG='python'
如下所示的定义也是错误的:
def saylang(LOVE_LANG):
global LOVE_LANG
下面是一个全局变量的示例:
def saylang(lang):
global LOVE_LANG
LOVE_LANG = "Python"
print('I think you will more love to learn',LOVE_LANG)
LOVE_LANG = 'Java'
print('Before running the function,you love to learn',LOVE_LANG)
saylang(LOVE_LANG)
print('After running the function,you love to learn',LOVE_LANG)
# PyCharm中运行结果如下:
# Before running the function,you love to learn Java
# I think you will more love to learn Python
# After running the function,you love to learn Python
再看另外一个例子:
def earnmoney():
global MONEY
MONEY = MONEY + 2000
print('You did good job. You earned more money! now you have $%s' % MONEY)
MONEY = 2000
print('You have $%s' % MONEY,'at first.',end='\n\n')
# print('You have ${} at first.\n'.format(MONEY))
earnmoney()
earnmoney()
earnmoney()
# PyCharm中运行结果如下:
# You have $2000 at first.
# You did good job. You earned more money! now you have $4000
# You did good job. You earned more money! now you have $6000
# You did good job. You earned more money! now you have $8000
# 调用了三次earnmoney(),每次都会增加$2000,最后就变成$8000了。
位置参数¶
- 位置参数是指调用函数时根据函数定义的参数位置来传递参数,此时调用函数时,参数个数必须与函数定义的个数相同,否则会报错。
- 位置参数的一个弊端是必须记住每个位置的参数的含义。
参见如下示例:
def print_love_lang(name,lang):
print('Hi,{},You love the language {}'.format(name,lang))
print_love_lang('mei','Python')
print_love_lang('mei')
# PyCharm中运行结果如下:
# Traceback (most recent call last):
# Hi,mei,You love the language Python
# File "D:/data/python_scripts/test.py", line 5, in <module>
# printLoveLang('mei')
# TypeError: print_love_lang() missing 1 required positional argument: 'lang'
#
# 进程已结束,退出代码1
注:示例中函数print_love_lang定义了两个参数name和lang,下面调用时print_love_lang(‘mei’,’Python’)指定了两个参数,’mei’传递给参数name,’Python’传递给参数lang,可以正常打印出结果。而print_love_lang(‘mei’)却只传递了一个参数,提示缺少一个位置参数’lang’。
关键字参数¶
- 如果函数中有许多形式参数时,而仅想指定其中一部分时,可以通过命名来为这些参数赋值,这被称为关键字参数,即使用名字(关键字)来给函数指定实参。
- 这样做有以下优点:不用担心参数的顺序;假设其他参数都有默认值,我们只用给我们关心的参数赋值。
- 函数调用时,位置参数必须在关键参数前面定义,否则会报“positional argument follows keyword argument”错误。
参见如下示例:
def print_love_lang(name,lang,year=3):
print('Hi,',name,'. You love the language',lang,'. You have learnt it',year,'years!')
print_love_lang('mei','Python',2) # 按位置参数进行依次传值
print_love_lang('mei','Python') # 按位置参数进行依次传值,未传值给year,year取默认值3
print_love_lang(name='mei',lang='Python',year=4) # 按关键参数进行依次传值
print_love_lang('mei','Python',year=5) # 按位置参数+关键参数的形式进行依次传值,位置参数必须在关键参数前面
print_love_lang('mei',lang='Python',year=6) # 按位置参数+关键参数的形式进行依次传值,位置参数必须在关键参数前面
# print_love_lang(name='mei','Python',year=7) # 此种方式是错误的,会报“positional argument follows keyword argument”错误
print_love_lang(year=7,name='mei',lang='Python') # 按关键参数进行依次传值,不需要按照位置参数的顺序给关键字参数传值
# 在PyCharm中运行结果:
# Hi, mei . You love the language Python . You have learnt it 2 years!
# Hi, mei . You love the language Python . You have learnt it 3 years!
# Hi, mei . You love the language Python . You have learnt it 4 years!
# Hi, mei . You love the language Python . You have learnt it 5 years!
# Hi, mei . You love the language Python . You have learnt it 6 years!
# Hi, mei . You love the language Python . You have learnt it 7 years!
# print_love_lang(name='mei','Python',year=7) # 此种方式是错误的,位置参数必须定义在关键参数前面
# 错误信息如下:
# print_love_lang(name='mei','Python',year=7) # 此种方式是错误的,位置参数必须定义在关键参数前面。
# ^
# SyntaxError: positional argument follows keyword argument
#
# 进程已结束,退出代码1
默认参数值¶
- 对于某些函数,如果不想为参数提供值的时候,函数可以自动以默认值作为参数的值。
- 声明参数时,默认参数必须放置在位置参数列表的后面,不能先声明有默认值的参数(可以理解为关键字参数),再声明无默认值的参数(可以理解为位置参数)。
- 必须先声明无默认值的参数,再声明有默认值的参数。
- 默认参数值在函数定义时已经计算出来,而不是在程序运行时。Python程序员经常犯的一个错误是把可变的数据类型(如列表或字典)当作默认的参数值。
默认值的定义方式为parameter=default_value,参见如下示例:
# 定义print_message函数
def print_message(message,times=10):
print(message * times)
print('打印20个*')
print_message('*',20) # 此处给print_message()函数正常传递两个参数
print('打印10个#')
print_message('#') # 此处给print_message()函数仅传递了一个参数,此时函数会将取times的默认值10,进行计算。
# 在PyCharm中运行结果:
# D:\ProgramFiles\Python3.6.2\python.exe D:/data/python_project/python_basic/basic_learning.py
# 打印20个*
# ********************
# 打印10个#
# ##########
下面示例给出了一个将可变数据类型当作默认值使用,存在的问题是:只有在第1次调用时列表是空的,第二次调用时就会存在之前调用的返回值:
In [1]: def testerr(arg,result=[]):
...: result.append(arg)
...: print(result)
...:
In [2]: testerr('a')
['a']
In [3]: testerr('b')
['a', 'b']
In [4]: testerr('c')
['a', 'b', 'c']
正确的做法如下:
In [1]: def testerr(arg,result=None):
...: result=[]
...: result.append(arg)
...: print(result)
...:
In [2]: testerr('a')
['a']
In [3]: testerr('b')
['b']
In [4]: testerr('c')
['c']
可变参数¶
- 可变参数也就是在函数中接收元组(tuple)和字典(dict)
- 普通函数中的用法:def function_name(*args, **kwargs):
- 类函数中的用法:def method_name(self, *args, **kwargs):
- 当参数的个数不确定时,可以使用*args或**kwargs来接收参数组成的元组或字典
- 使用*收集位置参数,使用**收集关键字参数
- 元组存储在args中,字典存储在kwargs中
- *args是可变的positional arguments列表组成的元组
- **kwargs是可变的keyword arguments列表组成的字典
- *args必须位于**kwargs之前,位置参数必须位于关键字参数前
- 参数顺序:位置参数、默认参数、*args、**kwargs
- *或**后面的关键字名称随意,不必非要使用args或kwargs,如*name,**lang等都可以
参见如下示例:
def print_love_lang(*args, **kwargs):
print('args:', args, 'type(args):', type(args))
for value in args:
print("positional argument:", value)
print('kwargs:', kwargs, 'type(kwargs):', type(kwargs))
for key in kwargs:
print("keyword argument:\t{}:{}".format(key, kwargs[key]))
print_love_lang(1, 2, 3, name='mei', lang='Python')
# 运行结果如下:
# args: (1, 2, 3) type(args): < class 'tuple'>
# positional argument: 1
# positional argument: 2
# positional argument: 3
# kwargs: {'name': 'mei', 'lang': 'Python'} type(kwargs): < class 'dict'>
# keyword argument: name:mei
# keyword argument: lang:Python
解包裹(unpack)参数¶
- *args和**kwargs语法不仅可以在函数定义中使用,同样可以在函数调用的时候使用。
- 不同的是,如果说在函数定义的位置使用*args和**kwargs是一个将参数pack(包裹)的过程,
- 那么在函数调用的时候就是一个将参数unpack(解包裹)的过程了。
- 解包裹时,dict中定义的key值必须与函数中定义的参数值相同、且参数个数相同,key的顺序不必保持与函数定义时的一致。
下面使用一个例子来加深理解:
def test_args(first, second, third, fourth, fifth):
print('First argument: ', first)
print('Second argument: ', second)
print('Third argument: ', third)
print('Fourth argument: ', fourth)
print('Fifth argument: ', fifth)
# Use *args
args = [1, 2, 3, 4, 5]
print('Use *args')
test_args(*args)
# results:
# Use *args
# First argument: 1
# Second argument: 2
# Third argument: 3
# Fourth argument: 4
# Fifth argument: 5
# Use **kwargs
kwargs = {
'first': 1,
'second': 2,
'third': 3,
'fourth': 4,
'fifth': 5
}
print('Use **kwargs')
test_args(**kwargs)
# results:
# Use **kwargs
# First argument: 1
# Second argument: 2
# Third argument: 3
# Fourth argument: 4
# Fifth argument: 5
文档字符串DocStrings¶
程序的可读性很重要,建议在函数体开始的部分附上函数定义说明的文档,这就是 文档字符串
文档字符串DocStrings使用三引号包裹起来
文档字符串DocStrings的惯例是一个多行字符串,有以下规范:
首行以大写字母开头,句号结尾 第二行空行 从第三行开始是详细的描述
可以使用__doc__ 调用函数的文档字符串。
如下所示:
def print_love_lang(name, lang, year=3):
"""
打印你学习编辑语言的年限.
:param name: define the name
:param lang: define the program language
:param year: define the time you have learned the language
:return: None
"""
print('Hi,', name, '. You love the language', lang, '. You have learn it', year, 'years!')
print(print_love_lang.__doc__)
# 在PyCharm中运行结果:
#
# 打印你学习编辑语言的年限.
# :param name: define the name
# :param lang: define the program language
# :param year: define the time you have learned the language
# :return: None
return语句¶
- return语句用来从一个函数返回,即跳出函数。return语句也可以返回一个值。
- 没有返回值的return语句等价于 return None 。
- None是python中表示没有任何东西的特殊类型。
- 如果函数结尾未提供return语句,python会给函数结尾暗含一个return None语句。
参见如下示例:
# 指定return返回值
def print_love_lang(name, lang, year=3):
print('Hi,', name, '. You love the language', lang, '. You have learn it', year, 'years!')
return 'nice'
result = print_love_lang('mei', 'Python', 2) # 按位置参数进行依次传值
print("return is:{}".format(result))
# 运行结果如下:
# Hi, mei . You love the language Python . You have learn it 2 years!
# return is:nice
# 不指定return返回值
def print_love_lang(name, lang, year=3):
print('Hi,', name, '. You love the language', lang, '. You have learn it', year, 'years!')
result = print_love_lang('mei', 'Python', 2) # 按位置参数进行依次传值
print("return is:{}".format(result))
# 运行结果如下:
# Hi, mei . You love the language Python . You have learn it 2 years!
# return is:None
Python中的None¶
如果函数没有定义return返回值,则默认返回None。
- None是Python中一个特殊的值,不表示任何数据。
- None作为布尔值时与False是一样的,但其与False有很多差别。
- 0值的整型/浮点型、空符符串(‘’)、空列表([])、空元组(())、空字典({})、空集合(set())都等价于False,但不等于None。
详细看以下示例:
>>> def is_none(thing):
... if thing is None:
... print("It's None")
... elif thing:
... print("It's True")
... else:
... print("It's False")
...
>>> is_none(None)
It's None
>>> is_none(True)
It's True
>>> is_none(False)
It's False
>>> is_none(1)
It's True
>>> is_none(0)
It's False
>>> is_none(-1)
It's True
>>> is_none('')
It's False
>>> is_none('string')
It's True
>>> is_none([])
It's False
>>> is_none(['list'])
It's True
>>> is_none({})
It's False
>>> is_none({'key':'value'})
It's True
>>> is_none((),)
It's False
>>> type((),)
<class 'tuple'>
>>> is_none(('tuple'))
It's True
>>> empty_set=set()
>>> type(empty_set)
<class 'set'>
>>> is_none(empty_set)
It's False
>>> is_none(set('One'))
It's True
内部函数¶
在函数中可以定义另外一个函数。
- 当需要在函数内部多次执行复杂的任务时,内部函数是非常有用的,从而避免了循环和代码的堆叠重复。
示例:
In [1]: def outer(a, b):
...: def inner(c, d):
...: return c + d
...: return inner(a, b)
...:
In [2]: outer(4, 7)
Out[2]: 11
函数闭包¶
- 内部函数可以看作是一个 闭包 。
- 闭包 是一个可以由另一个函数动态生成的函数,并且可以改变和存储函数外创建的变量的值。
示例:
In [1]: def outer2(num1, num2):
...: def inner2():
...: return num1 + num2
...: return inner2
...:
In [2]: outer2(4, 7)
Out[2]: <function __main__.outer2.<locals>.inner2()>
In [3]: outer2(4, 7)()
Out[3]: 11
In [4]: a = outer2(2, 3)
In [5]: b = outer2(4, 7)
In [6]: a()
Out[6]: 5
In [7]: b()
Out[7]: 11
In [8]: a
Out[8]: <function __main__.outer2.<locals>.inner2()>
In [9]: b
Out[9]: <function __main__.outer2.<locals>.inner2()>
In [10]: type(a)
Out[10]: function
In [11]: type(b)
Out[11]: function
- inner2()直接使用外部的变量a和b,而不是通过另外一个参数获取。
- outer2()返回值为inner2函数,而不是调用它。
- return inner2 返回的是inner2函数的复制。
- inner2是一个闭包,一个被动态创建的可以记录外部变量的函数。
- a和b是函数,也是闭包。调用它们时,就会计算外部参数num1与num2的和。
- inner2能访问outer2及其祖先函数的命名空间内的变量(如局部变量,函数参数)。
命名空间和作用域¶
- 一个名称在不同的使用情况下可能指代不同的事物。Python程序有各种各样的 命名空间 ,它指的是在该程序段内一个特定的名称是独一无二的,它和其他同名的命名空间是无关的。
- 每一个函数定义自己的命名空间。
- 每个程序的主要部分定义了全局命名空间,在这个命名空间的变量是全局变量,全局变量推荐使用大写字母或下划线组成的字符作为变量名GLOBAL_VAR_NAME,比如:LOVE_LANG = ‘Python’。
- 在函数内部定义的变量是局部变量,推荐使用小写字母或下划线组成的字符作为变量名local_var_name,比如:this_is_a_variable = 1。
- locals()函数返回局部命名空间内容的字典。
- globals()函数返回全局命名空间内容的字典。
示例:
#Filename:locals_globals.py
LOVE_LANG = 'Python'
def change_lang():
author = 'Guido van Rossum'
print('locals_in_function:', locals())
global LOVE_LANG
LOVE_LANG = 'GO'
print('globals_in_function:', globals())
print('locals_before:', locals())
print('globals_before:', globals())
change_lang()
print('locals_after:', locals())
print('globals_after:', globals())
使用python3 locals_globals.py运行:
[meizhaohui@localhost ~]$ python3 locals_globals.py
locals_before: {'change_lang': <function change_lang at 0x7f67f611a048>, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f67f60cca58>, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', 'LOVE_LANG': 'Python', '__cached__': None, '__spec__': None, '__doc__': None, '__file__': 'locals_globals.py', '__package__': None}
globals_before: {'change_lang': <function change_lang at 0x7f67f611a048>, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f67f60cca58>, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', 'LOVE_LANG': 'Python', '__cached__': None, '__spec__': None, '__doc__': None, '__file__': 'locals_globals.py', '__package__': None}
locals_in_function: {'author': 'Guido van Rossum'}
globals_in_function: {'change_lang': <function change_lang at 0x7f67f611a048>, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f67f60cca58>, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', 'LOVE_LANG': 'GO', '__cached__': None, '__spec__': None, '__doc__': None, '__file__': 'locals_globals.py', '__package__': None}
locals_after: {'change_lang': <function change_lang at 0x7f67f611a048>, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f67f60cca58>, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', 'LOVE_LANG': 'GO', '__cached__': None, '__spec__': None, '__doc__': None, '__file__': 'locals_globals.py', '__package__': None}
globals_after: {'change_lang': <function change_lang at 0x7f67f611a048>, '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7f67f60cca58>, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__', 'LOVE_LANG': 'GO', '__cached__': None, '__spec__': None, '__doc__': None, '__file__': 'locals_globals.py', '__package__': None}
对比函数执行前后的差异:

- 发现仅LOVE_LANG变量仅不一样,在执行函数change_lang后,LOVE_LANG从Python变成了GO。
- 在执行函数change_lang前,locals和global获取的值是一样的。
- 在执行函数change_lang时,locals只能获取到函数中的变量 {‘author’: ‘Guido van Rossum’}。
- 如果要在函数内修改全局变量值,需要先使用global LOVE_LANG 这样的方法定义全局变量。再进行赋值修改。
参考文献:
【1】python的位置参数、默认参数、关键字参数、可变参数区别 https://www.cnblogs.com/bingabcd/p/6671368.html
装饰器¶
本节全面介绍python中的装饰器(decorator)。
装饰器的引入¶
- 装饰器就是拓展原来函数功能的一种函数,这个函数的返回值也是一个函数。
- 装饰器其实就是一个闭包,把一个函数当作参数然后返回一个替代版参数。
- 使用装饰器的好处是在不用更改原函数的代码前提下给函数增加新的功能。
- 装饰器可以扩展原函数的日志,性能测试,时间测试,事务处理,缓存,权限校验等等功能。
- 装饰器会丢失原函数的元信息,需要使用
functools
包的wraps
装饰器来消除这种弊端。
定义一个打印消息的函数:
In [1]: def print_hello():
...: print('message:hello')
...:
In [2]: print_hello()
message:hello
现在有一个新的需求,希望可以打印函数的执行日志,显示执行的是哪个函数,于是在代码中添加日志代码(假设用print代替logging.info打印日志):
In [1]: def print_hello():
...: print('print_hello is running’) # 原始侵入,篡改原函数
...: print('message:hello')
...:
In [2]: print_hello()
print_hello is running
message:hello
如果我们还有其他的函数,如foo1(),foo2()函数也有类似的需求,再写一个print logging在foo1或foo2函数里面吗?这样就造成大量雷同的代码,为了减少重复写代码,我们可以这样做,重新定义一个新的函数:专门处理日志,日志处理完之后再执行真正的业务代码:
In [3]: def logit(func):
...: print('{} is running'.format(func.__name__))
...: func()
...:
In [4]: def print_hello():
...: print('message:hello')
...:
In [5]: logit(print_hello)
print_hello is running
message:hello
这样做逻辑上是没有问题的,功能是实现了,但是我们调用的时候不再是调用真正的业务逻辑print_hello函数,而是换成了logit函数,这破坏了原的的代码结构,现在不得不每次都要把原来的print_hello函数作为参数传递给logit函数。那么有没有更好的方式呢?当然有,答案就是使用装饰器函数。
简单装饰器¶
定义一个logit的装饰器:
In [6]: def logit(func):
...: def wrapper():
...: print('{} is running'.format(func.__name__))
...: return func()
...: return wrapper
...:
In [7]: def print_hello():
...: print('message:hello')
...:
In [8]: print_hi=logit(print_hello) # 因为装饰器logit(print_hello)返回的是函数对象wrapper,这条语句相当于print_hi = wrapper
In [9]: print_hi() # 执行print_hi()就相当于执行 wrapper()
print_hello is running
message:hello
In [10]: type(print_hi)
Out[10]: function
logit是一个装饰器,它把执行真正业务逻辑的函数func包裹在其中,看起来像是print_hello被logit装饰一样,logit返回的也是一个函数,函数名称是wrapper。函数进入和退出时,被称为一个横切面,这种编程方式被称为面向切面的编程。
@语法糖¶
- @符号就是装饰器的语法糖,它放在函数开始定义的地方,这样就可以省略最后一步再次赋值的操作。
接上面的In [6]定义的logit的装饰器,使用@语法糖装饰print_hello函数:
In [11]: @logit
...: def print_hello():
...: print('message:hello')
...:
In [12]: print_hello()
print_hello is running
message:hello
如上所示,有了@,我们就可以省去print_hi=logit(print_hello)这一句了,直接调用 print_hello() 即可得到想要的结果。你们看到了没有,print_hello() 函数不需要做任何修改,只需在定义的地方加上装饰器,调用的时候还是和以前一样,如果我们有其他的类似函数,我们可以继续调用装饰器来修饰函数,而不用重复修改函数或者增加新的封装。这样,我们就提高了程序的可重复利用性,并增加了程序的可读性。
*args, **kwargs的使用¶
- 在函数定义时,当参数不确定时,可以使用*args或**kwargs来接收参数组成的元组或字典;
- 使用*收集位置参数,使用**收集关键字参数;
- 元组存储在args中,字典存储在kwargs中。
如果我们业务逻辑中打印消息不固定为hello,需要传递一个参数message,并打印message的内容:
def print_message(message):
print('message:{}'.format(message))
此时,可以在定义wrapper函数的时候指定参数:
#Filename: print_message.py
def logit(func):
def wrapper(message):
print("%s is running" % func.__name__)
return func(message)
return wrapper
@logit
def print_message(message):
print('message:{}'.format(message))
print_message('new message1')
print_message('new message2')
使用python3 print_message.py运行:
[meizhaohui@localhost ~]$ python print_message.py
print_message is running
message:new message1
print_message is running
message:new message2
这样print_message函数定义的参数,如message就可以定义在wrapper函数中。
如果print_message中定义了多个参数,并设置有关键字参数,这个时候就可以在wrapper函数中使用*args, **kwargs,这样一个新的装饰器就出现了:
#Filename: print_message.py
def logit(func):
def wrapper(*args, **kwargs):
print("%s is running" % func.__name__)
return func(*args, **kwargs)
return wrapper
@logit
def print_message(name, message=None, lang='Python'):
print('Hi,{},you said message:{}.You are the father of {}'.format(name, message, lang))
print_message('Guido van Rossum','The Zen of Python')
print_message('Rob Pike','Go makes it easy to build simple, reliable, and efficient software',lang='Go')
使用python3 print_message.py运行:
[meizhaohui@localhost ~]$ python print_message.py
print_message is running
Hi,Guido van Rossum,you said message:The Zen of Python.You are the father of Python
print_message is running
Hi,Rob Pike,you said message:Go makes it easy to build simple, reliable, and efficient software.You are the father of Go
这样不论print_message函数有多少个参数,logit装饰器都可以使用!!!装饰器就像一个注入符号:有了它,拓展了原来函数的功能既不需要侵入函数内更改代码,也不需要重复执行原函数。
带参数的装饰器¶
装饰器还有更大的灵活性,例如带参数的装饰器,在上面的装饰器调用中,该装饰器接收唯一的参数就是执行业务的函数func。装饰器的语法允许我们在调用时,提供其它参数,比如@logit(level)。这样,就为装饰器的编写和使用提供了更大的灵活性。比如,我们可以在装饰器中指定日志的等级,因为不同业务函数可能需要的日志级别是不一样的。
我们按实际场景使用logging模块重新一个日志装饰器:
#Filename: print_logs.py
def logit(level):
import logging
def decorator(func):
def wrapper(*args, **kwargs):
logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
if level == 'warning':
logging.warn("%s is running" % func.__name__)
elif level == 'info':
logging.info("%s is running" % func.__name__)
return func(*args, **kwargs)
return wrapper
return decorator
@logit(level='info')
def print_hello():
print('message:hello')
@logit(level='warning')
def print_message(name, message=None, lang='Python'):
print('Hi,{},you said message:{}.You are the father of {}'.format(name, message, lang))
print_hello()
print_message('Guido van Rossum','The Zen of Python')
使用python3 print_logs.py运行:
[meizhaohui@localhost ~]$ python3 print_logs.py
2019-03-19 22:48:53,455 - root - INFO - print_hello is running
message:hello
2019-03-19 22:48:53,455 - root - WARNING - print_message is running
Hi,Guido van Rossum,you said message:The Zen of Python.You are the father of Python
上面的logit是允许带参数的装饰器。它实际上是对原有装饰器的一个函数封装,并返回一个装饰器。我们可以将它理解为一个含有参数的闭包。当我 们使用@logit(level=”warning”)调用的时候,Python能够发现这一层的封装,并把参数传递到装饰器的环境中。@logit(level=’warning’)等价于@decorator。
类装饰器¶
装饰器不仅可以是函数,还可以是类,相比函数装饰器,类装饰器具有灵活度大、高内聚、封装性等优点。使用类装饰器主要依靠类的__call__方法,当使用 @ 形式将装饰器附加到函数上时,就会调用此方法。
示例:
#Filename: class_decorator.py
class Foo(object):
def __init__(self, func):
self._func = func
def __call__(self):
print ('class decorator runing')
self._func()
print ('class decorator ending')
@Foo
def bar():
print ('bar')
bar()
使用python3 class_decorator.py运行:
[meizhaohui@localhost ~]$ python3 class_decorator.py
class decorator runing
bar
class decorator ending
装饰器的弊端¶
使用装饰器极大地复用了代码,但是他有一个弊端就是原函数的元信息不见了,比如函数的docstring、__name__、参数列表等。
在print_logs.py文件中增加文档字符串后,最后打印函数的docstring、__name__,内容如下:
#Filename: print_logs.py
def logit(level):
import logging
def decorator(func):
def wrapper(*args, **kwargs):
'''decorator docs'''
logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
if level == 'warning':
logging.warn("%s is running" % func.__name__)
elif level == 'info':
logging.info("%s is running" % func.__name__)
return func(*args, **kwargs)
return wrapper
return decorator
@logit(level='info')
def print_hello():
'''print_hello docs'''
print('message:hello')
@logit(level='warning')
def print_message(name, message=None, lang='Python'):
'''print_message docs'''
print('Hi,{},you said message:{}.You are the father of {}'.format(name, message, lang))
print_hello()
print_message('Guido van Rossum','The Zen of Python')
print(print_hello.__name__, print_hello.__doc__)
print(print_message.__name__, print_message.__doc__)
使用python3 print_logs.py运行:
meizhaohui@localhost ~]$ python3 print_logs.py
2019-03-19 23:06:29,019 - root - INFO - print_hello is running
message:hello
2019-03-19 23:06:29,019 - root - WARNING - print_message is running
Hi,Guido van Rossum,you said message:The Zen of Python.You are the father of Python
wrapper decorator docs
wrapper decorator docs
可以发现print_hello和print_message函数都被wrapper取代了,当然它的docstring,__name__就是变成了wrapper函数的信息了。
消除装饰器的弊端¶
为了消除装饰器的弊端,Python的functools包中提供了一个叫wraps的装饰器来消除这样的副作用。写一个decorator装饰器的时候,最好在实现之前加上functools的wrap,它能保留原有函数的名称和docstring。
改进上面的print_logs.py,内容如下:
#Filename: print_logs.py
from functools import wraps
def logit(level):
import logging
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
'''decorator docs'''
logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
if level == 'warning':
logging.warn("%s is running" % func.__name__)
elif level == 'info':
logging.info("%s is running" % func.__name__)
return func(*args, **kwargs)
return wrapper
return decorator
@logit(level='info')
def print_hello():
'''print_hello docs'''
print('message:hello')
@logit(level='warning')
def print_message(name, message=None, lang='Python'):
'''print_message docs'''
print('Hi,{},you said message:{}.You are the father of {}'.format(name, message, lang))
print_hello()
print_message('Guido van Rossum','The Zen of Python')
print(print_hello.__name__, print_hello.__doc__)
print(print_message.__name__, print_message.__doc__)
使用python3 print_logs.py运行:
[meizhaohui@localhost ~]$ python3 print_logs.py
2019-03-19 23:14:45,636 - root - INFO - print_hello is running
message:hello
2019-03-19 23:14:45,636 - root - WARNING - print_message is running
Hi,Guido van Rossum,you said message:The Zen of Python.You are the father of Python
print_hello print_hello docs
print_message print_message docs
内置装饰器¶
内置的装饰器和普通的装饰器原理是一样的,只不过返回的不是函数,而是类对象,所以更难理解一些。 如@property,@staticmethod,@classmethod,具体可参见面向对象编程章节。
异常¶
本节介绍Python的异常。
异常的定义¶
- 异常是一个事件,该事件会在程序执行时发生,影响程序的正常执行。
- 一般情况下,在Python无法正常处理程序时就会发生一个异常。
- 异常是Python对象,表示一个错误。
- 当Python脚本发生异常时,我们需要捕获处理它,否则程序会终止运行。
- 当你执行可能出错的代码时,需要适当的异常处理程序用于阻止潜在的错误发生。
- 在异常可能发生的地方添加异常处理程序,对于用户明确错误是一种好办法。
常见的异常¶
ZeroDivisionError除零异常:
In [1]: 1/0 --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-1-9e1622b385b6> in <module> ----> 1 1/0 ZeroDivisionError: division by zero
AttributeError属性异常:
In [2]: import os In [3]: os.name Out[3]: 'posix' In [4]: os.Name --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-4-edb55bc87dba> in <module> ----> 1 os.Name AttributeError: module 'os' has no attribute 'Name'
ImportError导入异常:
In [5]: import maths --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-8-6e25cba24411> in <module> ----> 1 import maths ImportError: No module named 'maths' In [6]: import math
IndexError索引异常:
In [7]: list1=['a','b'] In [8]: list1[3] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-8-831b15cbf272> in <module> ----> 1 list1[3] IndexError: list index out of range
SyntaxError语法异常:
In [9]: print 'hello' File "<ipython-input-9-5a1ef41e7057>", line 1 print 'hello' ^ SyntaxError: Missing parentheses in call to 'print'
IndentationError缩进异常:
In [10]: a = 1 In [11]: if a > 0: ...: print(a) ...: print(a + 1) File "<tokenize>", line 3 print(a + 1) ^ IndentationError: unindent does not match any outer indentation level
内置异常¶
Python所有的错误都是从BaseException类派生的,内置异常见如下:
BaseException
+-- SystemExit
+-- KeyboardInterrupt
+-- GeneratorExit
+-- Exception
+-- StopIteration
+-- StopAsyncIteration
+-- ArithmeticError
| +-- FloatingPointError
| +-- OverflowError
| +-- ZeroDivisionError
+-- AssertionError
+-- AttributeError
+-- BufferError
+-- EOFError
+-- ImportError
| +-- ModuleNotFoundError
+-- LookupError
| +-- IndexError
| +-- KeyError
+-- MemoryError
+-- NameError
| +-- UnboundLocalError
+-- OSError
| +-- BlockingIOError
| +-- ChildProcessError
| +-- ConnectionError
| | +-- BrokenPipeError
| | +-- ConnectionAbortedError
| | +-- ConnectionRefusedError
| | +-- ConnectionResetError
| +-- FileExistsError
| +-- FileNotFoundError
| +-- InterruptedError
| +-- IsADirectoryError
| +-- NotADirectoryError
| +-- PermissionError
| +-- ProcessLookupError
| +-- TimeoutError
+-- ReferenceError
+-- RuntimeError
| +-- NotImplementedError
| +-- RecursionError
+-- SyntaxError
| +-- IndentationError
| +-- TabError
+-- SystemError
+-- TypeError
+-- ValueError
| +-- UnicodeError
| +-- UnicodeDecodeError
| +-- UnicodeEncodeError
| +-- UnicodeTranslateError
+-- Warning
+-- DeprecationWarning
+-- PendingDeprecationWarning
+-- RuntimeWarning
+-- SyntaxWarning
+-- UserWarning
+-- FutureWarning
+-- ImportWarning
+-- UnicodeWarning
+-- BytesWarning
+-- ResourceWarning
异常处理语法¶
异常处理语法如下:
try:
<statements> #运行try语句块,并试图捕获异常
except <ExceptionErrorName1>:
<statements> #如果ExceptionErrorName1异常发现,那么执行该语句块。
except (ExceptionErrorName2,ExceptionErrorName3):
<statements> #如果元组内的任意异常发生,那么捕获它
except <ExceptionErrorName4> as <variable>:
<statements> #如果ExceptionErrorName4异常发生,那么进入该语句块,并把异常实例命名为variable
except:
<statements> #发生了以上所有列出的异常之外的异常
else:
<statements> #如果没有异常发生,那么执行该语句块
finally:
<statement> #无论是否有异常发生,均会执行该语句块
说明:
- else和finally是可选的,可能会有0个或多个except,但是,如果出现一个else的话,必须有至少一个except。
- 不管你如何指定异常,异常总是通过实例对象来识别,并且大多数时候在任意给定的时刻激活。一旦异常在程序中某处由一条except子句捕获,它就死掉了,除非由另一个raise语句或错误重新引发它。
- 在try中的代码如果发生异常,则会被捕获,然后执行except中的代码,否则跳过except块代码,此时执行else语句块。
- 无论异常是否发生finally语句块的代码一定会执行。
- 在对异常进行处理时,建议except后面接具体的异常名称,不要直接使用except不接任何异常名去处理异常,因为except适用于任何异常类型,你可以使用一个except去捕获所有的异常,但这样的处理方式会比较泛化。
- 可以使用as将异常名称赋值给变量,再输出存储在变量中的异常信息。
异常处理示例¶
- 示例1
处理除零异常:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt1(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
expt1(4,0)
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
程序出现异常,异常信息:被除数为0
以上程序,我们已经获取到了除零异常ZeroDivisionError,感觉自己处理得很完美。假如我们将expt1(4,0)改为expt1(4,’‘),然后再运行看看会发生什么。
运行:
[meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
Traceback (most recent call last):
File "try_except_else_finally.py", line 10, in <module>
expt1(4,'')
File "try_except_else_finally.py", line 5, in expt1
c = a/b
TypeError: unsupported operand type(s) for /: 'int' and 'str'
怎么又出现了一个TypeError类型异常,但我们却没有捕获到,上面提示不能在int类型和字符串类型之间做除法运算。看来我们要补上这个异常的处理,获取到这个异常。
- 示例2
我们改一个这个脚本文件:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt2(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
except TypeError:
print('程序出现异常,异常信息:参数a和b类型不同,仅支持float或int类型')
expt2(4,'')
运行:
[meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型
我们看一下能不能捕获除零异常:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt2(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
except TypeError:
print('程序出现异常,异常信息:参数a和b类型不同,仅支持float或int类型')
expt2(4,0)
运行:
[meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
程序出现异常,异常信息:被除数为0
可以看出除零异常和类型异常都能正常的捕获到。
- 示例3
联合else和finally一起使用,修改脚本文件:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt3(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
else:
print('No exception')
finally:
print('always display')
expt3(4, 2)
运行:
[meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
the value is:2.0
No exception
always display
没有异常正常运行,打印出4/2的值为2.0,因为没有异常,except语句不会被执行,但else语句会被执行,所有会打印出”No exception”,另外finally语句一直会被执行,所以”always display”会被打印出来。
异常中的return语句¶
如果在异常处理时指定return语句,会出现什么效果。
- 没有异常,且没有设置return
示例4:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
# return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
# return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
# return 'exceptType'
else:
print('No exception')
# return 'else'
finally:
print('always display')
# return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 2)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
the value is:2.0
No exception
always display
return_string:None
因为没有指定return语句,返回的是隐式返回值None。
- 没有异常,且try中设置return
示例5:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
# return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
# return 'exceptType'
else:
print('No exception')
# return 'else'
finally:
print('always display')
# return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 2)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
the value is:2.0
No exception
return_string:try
仅在try中有return时,没有异常的情况下,会返回try中的返回值”try”。
- 没有异常,在try,except,else,finally中都有return语句
示例6:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
return 'exceptType'
else:
print('No exception')
return 'else'
finally:
print('always display')
return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 2)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
the value is:2.0
always display
return_string:finally
在try,except,else,finally中都有return语句时,会返回finally中的返回值”finally”,并且except,else语句不会被执行,try中的return语句不起作用。
- 没有异常,在except,else,finally中有return语句
示例7:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
# return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
return 'exceptType'
else:
print('No exception')
return 'else'
finally:
print('always display')
return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 2)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
the value is:2.0
No exception
always display
return_string:finally
此时,except被忽略,else语句被执行,打印出”No exception”,但最后的返回值还是finally语句中的返回值”finally”。
- 没有异常,在except,else中有return语句
示例8:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
# return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
return 'exceptType'
else:
print('No exception')
return 'else'
finally:
print('always display')
# return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 2)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
the value is:2.0
No exception
always display
return_string:else
没有异常,在except和else中有return时,会将return中的返回值”else”作为函数的返回值。
- 有异常,在try,except,else,finally中都有return语句
示例9:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
return 'exceptType'
else:
print('No exception')
return 'else'
finally:
print('always display')
return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 0)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
程序出现异常,异常信息:被除数为0
always display
return_string:finally
有异常,else不会被执行,except语句会执行,由于最后finally有return语句,最后返回值是finally语句中的”finally”。
- 有异常,在try,except,else中都有return语句
示例10:
# Filename: try_except_else_finally.py
# Author: meizhaohui
def expt4(a, b):
try:
c = a/b
print('the value is:{}'.format(c))
return 'try'
except ZeroDivisionError:
print('程序出现异常,异常信息:被除数为0')
return 'exceptZero'
except TypeError:
print('程序出现异常,异常信息:参数a或b的类型不支持,仅支持float或int类型')
return 'exceptType'
else:
print('No exception')
return 'else'
finally:
print('always display')
# return 'finally'
if __name__ == '__main__':
return_string = expt4(4, 0)
print('return_string:{}'.format(return_string))
运行:
meizhaohui@localhost python_scripts]$ python3 try_except_else_finally.py
程序出现异常,异常信息:被除数为0
always display
return_string:exceptZero
有异常,由于finally语句没有return语句,try有异常,不会执行try代码块中的return语句,执行except语句,返回值是除零异常中的”exceptZero”。
总结:
- try中有return语句时,会阻止else语句的执行,并不影响finally语句的执行。
- try中没有return语句时,如果try中没有异常,except语句会被跳过,执行else语句。
- 在含有return的情况下,并不会阻碍finally语句的执行。
- 在try和finally中都有return时,无论有没有异常,finally语句会修改最后的返回值。
- 在finally中没有return语句时,try,except,else中有return语句,没有异常时,else中返回值作为最终返回值;有异常时,except中返回值作为最终返回值。
- 如果没有异常发生,try中有return 语句,这个时候else块中的代码是没有办法执行到的,但是finally语句中如果有return语句会修改最终的返回值,我个人理解的是try中return语句先将要返回的值放在某个CPU寄存器,然后运行finally语句的时候修改了这个寄存器的值,最后在返回到try中的return语句返回修改后的值。
- 如果有异常发生,try中的return语句肯定是执行不到,在捕获异常的except语句中,如果存在return语句,那么也要先执行finally的代码,finally里面的代码会修改最终的返回值,然后在从except块的return语句返回最终修改的返回值, 和第5条一致。
以上总结可以看在使用try进行异常捕获处理时,return语句的处理相当麻烦。总结以下三点:
- 不要在try,except,else里面写返回值,如果没有finally语句,就在最后面写return语句,或者将return语句写在finally中。
- try,except,else里面是做某事,不处理返回值。
- 在try中的代码尽可能的少,减少异常出现的可能性。
raise手动抛出异常¶
使用raise语句手动抛出异常:
- raise ExceptionErrorName # 抛出ExceptionErrorName异常
- raise ExceptionErrorName(‘info’) # 抛出ExceptionErrorName异常,提供额外的异常信息info
示例:
In [1]: raise NameError
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-1-42b67b2fc75d> in <module>
----> 1 raise NameError
NameError:
In [2]: raise NameError('名称异常')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-2-daa6ee1852fc> in <module>
----> 1 raise NameError('名称异常')
NameError: 名称异常
In [3]: raise ZeroDivisionError('除零异常')
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
<ipython-input-3-02e66f4d0766> in <module>
----> 1 raise ZeroDivisionError('除零异常')
ZeroDivisionError: 除零异常
自定义异常¶
- 用户可以自己定义一个Python中没有涉及的异常。
- 自定义异常必须直接或间接的继承Exception类。
- 自定义异常按照命名规范以”Error”结尾,显式地告诉用户这是异常。
- 自定义异常只能使用raise方式手动抛出。
我们定义一个网络异常的自定义异常:
# Filename: networkerror.py
# Author: meizhaohui
class NetworkError(RuntimeError):
def __init__(self, host):
self._host = host
def __str__(self):
return 'Unknow host:{}'.format(self._host)
if __name__ == '__main__':
try:
raise NetworkError('python.org')
except NetworkError as e:
print('NetworkError: %s' % e)
运行:
[meizhaohui@localhost python_scripts]$ python3 networkerror.py
NetworkError: Unknow host:python.org
说明:
- 自定义异常类NetworkError继承RuntimeError,而从内置异常图中可以看RuntimeError继承Exception,因此NetworkError是间接继承Exception类的。
- 使用raise手动抛出NetworkError异常,并在except中捕获,并打印出异常的信息。
参考文献:
模块-sys模块¶
模块与包基本介绍¶
- 模块是可被重用的代码文件
- 模块文件名必须以.py结尾,一个模块仅仅是Python代码的一个文件
- 模块可以从其他程序输入以便利用它的功能
- 引用其他模块的代码时使用import语句,被引用模块中的代码和变量对该程序可见
- 模块是不带.py扩展的另外一个Python文件名
- 可以在函数内部调用import语句导入模块,也可以把所有的import语句都放在文件的开头,使代码之间的依赖关系清晰
- 如果被导入的代码被多次引用,就应该考虑在函数外部导入;如果被导入的代码使用有限,就在函数内部导入
- 可以使用别名导入模块
- 模块搜索路径存储在sys.path中
- 为了使Python应用更具有可扩展性,可以把多个模块组织成文件层次,称之为包
- 包可以理解为将多个模块放在一个文件夹里面,这个文件夹名就可以称为包名,并在这个文件夹下面添加一个__init__.py文件,表示把该目录作为一个包
引用模块语法:
import module_name
import module_name as alias_name
from module_name import *
from module_name import fun_name
from module_name import fun_name as alias_name
from module_name import var_name
tab键的使用¶
模块中经常会定义很多方法,但我们通常是很难记住模块的方法的具体的名称是如何写的,这时我们在自己电脑上面构建一个tab.py模块,使自己的编译器在按下tab键后有联想功能,列出所有相关的方法或常量变量等信息,提高自己的开发效率。
tab.py的内容如下:
import sys
import readline
import rlcompleter
import atexit
import os
# need install the readline module
# windows install: pip install pyreadline
# linux install: pip install readline
# tab completion
readline.parse_and_bind('tab: complete')
# history file
# windows
histfile = os.path.join(os.environ['HOMEPATH'], '.pythonhistory')
# linux
# histfile = os.path.join(os.environ['HOME'], '.pythonhistory')
try:
readline.read_history_file(histfile)
except IOError:
pass
atexit.register(readline.write_history_file, histfile)
del os, histfile, readline, rlcompleter
将tab.py存放在自己python的安装目录下的Lib目录下,如 D:\Program Files (x86)\python3.6.2\Lib 打开命令行窗口,输入python进入python编译环境。
使用tab键的方法:
C:\>python
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tab
>>> import os
>>> os. [此处输入.点号后按tab键]
os.DirEntry( os.chdir( os.getcwd( os.set_inheritable(
os.F_OK os.chmod( os.getcwdb( os.spawnl(
os.MutableMapping( os.close( os.getenv( os.spawnle(
os.O_APPEND os.closerange( os.getlogin( os.spawnv(
os.O_BINARY os.cpu_count( os.getpid( os.spawnve(
os.O_CREAT os.curdir os.getppid( os.st
os.O_EXCL os.defpath os.isatty( os.startfile(
os.O_NOINHERIT os.device_encoding( os.kill( os.stat(
os.O_RANDOM os.devnull os.linesep os.stat_float_times(
os.O_RDONLY os.dup( os.link( os.stat_result(
os.O_RDWR os.dup2( os.listdir( os.statvfs_result(
sys模块的使用¶
- sys模块是系统模块,sys是system的简写。包含了与python解释器和环境有关的函数。
使用上面的方法查看sys模块常用方法:
C:\>python
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tab
>>> import sys
>>> sys.
sys.api_version sys.getcheckinterval( sys.modules
sys.argv sys.getdefaultencoding( sys.path
sys.base_exec_prefix sys.getfilesystemencodeerrors( sys.path_hooks
sys.base_prefix sys.getfilesystemencoding( sys.path_importer_cache
sys.builtin_module_names sys.getprofile( sys.platform
sys.byteorder sys.getrecursionlimit( sys.prefix
sys.call_tracing( sys.getrefcount( sys.ps1
sys.callstats( sys.getsizeof( sys.ps2
sys.copyright sys.getswitchinterval( sys.set_asyncgen_hooks(
sys.displayhook( sys.gettrace( sys.set_coroutine_wrapper(
sys.dllhandle sys.getwindowsversion( sys.setcheckinterval(
sys.dont_write_bytecode sys.hash_info sys.setprofile(
sys.exc_info( sys.hexversion sys.setrecursionlimit(
sys.excepthook( sys.implementation sys.setswitchinterval(
sys.exec_prefix sys.int_info sys.settrace(
sys.executable sys.intern( sys.stderr
sys.exit( sys.is_finalizing( sys.stdin
sys.flags sys.last_traceback sys.stdout
sys.float_info sys.last_type( sys.thread_info
sys.float_repr_style sys.last_value sys.version
sys.get_asyncgen_hooks( sys.maxsize sys.version_info
sys.get_coroutine_wrapper( sys.maxunicode sys.warnoptions
sys.getallocatedblocks( sys.meta_path sys.winver
>>> sys.
sys模块常用方法:
sys.argv 获取正在执行的命令行参数的参数列表(list)
sys.argv[0] 为脚本pathname名称
sys.argv[1] 用户为脚本第1个参数
sys.argv[2] 用户为脚本第2个参数
sys.path python目录列表,供python从中查找第三方扩展模块
sys.platform 当前环境的平台,linux环境为'linux',windows环境为'win32'
sys.stdin 标准输入
sys.stdout 标准输出
sys.stderr 标准错误输出
sys.getdefaultencoding(): 获取系统当前编码,python3.6.2中为'utf-8'。
sys.getfilesystemencoding(): 获取系统当前编码,python3.6.2中为'utf-8'。
sys.exit(N) 异常退出时,返回码为N。正常退出时为0。如sys.exit(-1) 。
sys.ps1 获取python交互运行时的初始提示符
sys.ps2 获取python交互运行时的继行(块)提示符
>>> sys.path
['', 'D:\\Program Files (x86)\\python3.6.2\\python36.zip', 'D:\\Program Files (x86)\\python3.6.2\\DLLs', 'D:\\Program Files (x86)\\python3.6.2\\lib', 'D:\\Program Files (x86)\\python3.6.2', 'D:\\Program Files (x86)\\python3.6.2\\lib\\site-packages']
>>> sys.platform
'win32'
>>> sys.stdin
<_io.TextIOWrapper name='<stdin>' mode='r' encoding='utf-8'>
>>> sys.stdout
<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>
>>> sys.stderr
<_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
>>> sys.getdefaultencoding()
'utf-8'
>>> sys.getfilesystemencoding()
'utf-8'
>>> sys.ps1
'>>> '
>>> sys.ps2
'... '
下面使用一个例子来加深理解:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
# @Time : 2018/6/30 14:58
# @Author : 梅朝辉(meizhaohui)
# @Email : mzh.whut@gmail.com
# @Filename : sys_arguments.py
# @Description : 测试sys模块的使用,获取外部参数、退出码
# @Software : PyCharm
# @Python Version: python3.6.2
"""
import sys
def sys_arguments():
if len(sys.argv) == 3:
print("You are greatly!")
print("the script pathname is: {}".format(sys.argv[0]))
print("the first argument is: {}".format(sys.argv[1]))
print("the second argument is: {}".format(sys.argv[2]))
sys.exit()
else:
print("Use method: python {} arg1 arg2".format(sys.argv[0]))
sys.exit(-1)
sys_arguments()
在命令行窗口执行:
D:\data\python_scripts>python sys_arguments.py first second
You are greatly!
the script pathname is: sys_arguments.py
the first arguement is: first
the second arguement is: second
D:\data\python_scripts>echo %errorlevel% [ 梅朝辉备注:windows环境下获得执行cmd命令后的返回值的方法 ]
0
D:\data\python_scripts>python sys_arguments.py
Use method: python sys_arguments.py arg1 arg2
D:\data\python_scripts>echo %errorlevel%
-1
import 与from … import的区别¶
- import module 只是加载一个模块,相当于”把车给我”,对于模块中的函数、变量等,每次调用需要”module.function”或”module.var_name”。
- from … import … 可以加载模块,且可以加载模块中的类、函数或者特定的成员,相当于”把车里面的矿泉水给我”。因有可能多种模块中存在同样名称的成员或类等,建议少使用此这种方式。
- from … import … as new_name 导入某模块并重命名为new_name。
如,下面这种使用import方式导入sys模块,需要使用sys模块中的platform时,必须带上sys:
>>> import sys
>>> print(sys.platform)
win32
如,下面这种使用from … import方式导入sys模块,导入后可以直接使用sys模块中的常量ps1/ps2/platform等,不需要加sys:
from sys import path,argv,platform
>>> from sys import platform,ps1,ps2
>>> ps1
'>>> '
>>> ps2
'... '
>>> platform
'win32'
如,下面这种使用from … import … as new_name方式导入sys模块中的常量copyright,并重命令为RT,直接输入RT就以打印出版权信息:
>>> from sys import copyright as RT
>>> RT
'Copyright (c) 2001-2017 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.'
模块-os模块¶
目录
os模块基本介绍¶
- os模块包含操作系统功能
- os模快提供了一个便携的方式去使用操作系统的相关功能
- 如果你只是想要读取或写入文件请参阅 open()
- 如果你想要操作路径,请参阅 os.path 模块
- 如果你想要在命令行上读取所有文件中的所有行,请参阅 fileinput 模块
- 需要创建临时文件和目录,请参阅 tempfile 模块
- 需要高级的文件和目录处理请参见 shutil 模块
- 生成新进程和检索其结果,请参阅subprocess模块
引用os模块语法:
import os
tab键的使用¶
模块中经常会定义很多方法,但我们通常是很难记住模块的方法的具体的名称是如何写的,这时我们在自己电脑上面构建一个tab.py模块,使自己的编译器在按下tab键后有联想功能,列出所有相关的方法或常量变量等信息,提高自己的开发效率。
tab.py的内容如下:
import sys
import readline
import rlcompleter
import atexit
import os
# need install the readline module
# windows install: pip install pyreadline
# linux install: pip install readline
# tab completion
readline.parse_and_bind('tab: complete')
# history file
# windows
histfile = os.path.join(os.environ['HOMEPATH'], '.pythonhistory')
# linux
# histfile = os.path.join(os.environ['HOME'], '.pythonhistory')
try:
readline.read_history_file(histfile)
except IOError:
pass
atexit.register(readline.write_history_file, histfile)
del os, histfile, readline, rlcompleter
将tab.py存放在自己python的安装目录下的Lib目录下,如 D:\Program Files (x86)\python3.6.2\Lib 打开命令行窗口,输入python进入python编译环境。
使用tab键的方法:
C:\>python
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tab
>>> import os
>>> os.
os.DirEntry( os.R_OK os.errno os.getcwdb( os.readlink( os.supports_dir_fd
os.F_OK os.SEEK_CUR os.error( os.getenv( os.remove( os.supports_effective_ids
os.MutableMapping( os.SEEK_END os.execl( os.getlogin( os.removedirs( os.supports_fd
os.O_APPEND os.SEEK_SET os.execle( os.getpid( os.rename( os.supports_follow_symlinks
os.O_BINARY os.TMP_MAX os.execlp( os.getppid( os.renames( os.symlink(
os.O_CREAT os.W_OK os.execlpe( os.isatty( os.replace( os.sys
os.O_EXCL os.X_OK os.execv( os.kill( os.rmdir( os.system(
os.O_NOINHERIT os.abc os.execve( os.linesep os.scandir( os.terminal_size(
os.O_RANDOM os.abort( os.execvp( os.link( os.sep os.times(
os.O_RDONLY os.access( os.execvpe( os.listdir( os.set_handle_inheritable( os.times_result(
os.O_RDWR os.altsep os.extsep os.lseek( os.set_inheritable( os.truncate(
os.O_SEQUENTIAL os.chdir( os.fdopen( os.lstat( os.spawnl( os.umask(
os.O_SHORT_LIVED os.chmod( os.fsdecode( os.makedirs( os.spawnle( os.uname_result(
os.O_TEMPORARY os.close( os.fsencode( os.mkdir( os.spawnv( os.unlink(
os.O_TEXT os.closerange( os.fspath( os.name os.spawnve( os.urandom(
os.O_TRUNC os.cpu_count( os.fstat( os.open( os.st os.utime(
os.O_WRONLY os.curdir os.fsync( os.pardir os.startfile( os.waitpid(
os.P_DETACH os.defpath os.ftruncate( os.path os.stat( os.walk(
os.P_NOWAIT os.device_encoding( os.get_exec_path( os.pathsep os.stat_float_times( os.write(
os.P_NOWAITO os.devnull os.get_handle_inheritable( os.pipe( os.stat_result(
os.P_OVERLAY os.dup( os.get_inheritable( os.popen( os.statvfs_result(
os.P_WAIT os.dup2( os.get_terminal_size( os.putenv( os.strerror(
os.PathLike( os.environ os.getcwd( os.read( os.supports_bytes_environ
os模块的使用¶
os模块常用方法:
>>> import os
>>> os.
os.abort() # 终止python编译器,此时会直接退出python环境
os.chdir(path) # 改变当前工作路径
os.cpu_count() # 返回CPU数量
os.getcwd() # 返回当前工作路径
os.environ # 返回系统环境变量
os.getenv('key') # 返回指定key键的系统环境变量的值
os.putenv('key','value') # 给某环境变量赋值,但不会直接影响系统环境变量,可通过os.environ修改环境变量
os.getlogin() # 返回当前登陆用户名
os.getpid() # 返回当前进程的pid
os.getppid() # 返回当前进程的父进程的pid
os.name # 字符串指示当前使用平台。win->'nt'; Linux->'posix'
os.curdir # 当前工作目录 ('.')
os.pardir # 获取当前目录的父目录字符串名:('..')
os.sep # 路径分隔符。win->'\\'; Linux->'/'
os.altsep # 备用路径名分隔符
os.linesep # 当前平台所使用的行终止符,win->"\r\n"; Linux->"\n"
os.extsep # 扩展文件分隔符'.'
os.path # posixpath或ntpath的路径
os.pathsep # 文件路径分隔符
os.defpath # 默认的可执行文件的搜索路径
os.devnull # 空设备的文件路径
os.get_terminal_size() # 终端窗口大小
os.get_exec_path() # 返回在启动进程时将搜索命名可执行文件(类似于shell)的目录列表
os.link(src, dst) # 创建源地址src的硬链接目标地址dst
# src:这是来源路径,原来存在的路径
# dst:这是原来不存在的目标路径
os.readlink(path) # 返回软链接所指向的文件
os.symlink(src, dst) # 创建地址src的软链接目标地址dst
os.access(path,mode) # 检验path是否有mode模式的权限,返回True/False
# mode可以为os.F_OK/os.R_OK/os.W_OK/os.X_OK
os.F_OK # 作为os.access()的mode参数,测试path是否存在。
os.R_OK # 作为os.access()的mode参数,测试path是否可读。
os.W_OK # 作为os.access()的mode参数,测试path是否可写。
os.X_OK # 作为os.access()的mode参数,测试path是否可执行。
os.kill(pid, signal) # 发送一个信号signal给进程id为pid的进程
import signal # signal需要加载signal模块
# windows上,可调用signal.SIGABRT退出进程,signal.SIGILL杀死进程
os.system(cmd) # 执行系统命令,使用subprocess模块
os.listdir(path) # 返回path指定的文件夹包含的文件或文件夹的名字的列表
os.scandir(path) # 返回path指定的文件夹的DirEntry对象的迭代器。(注:仅显示path目录层级的文件或文件夹,不会递归显示子文件夹中的数据!)
# 当目录文件很多时,使用此方法运行得更快。
# 并提供操作系统返回的附加数据的简单方法,如:
# entry.inode() # 返回条目的inode编号
# entry.is_dir(follow_symlinks=True)
# 如果此条目是指向目录的目录或符号链接,则返回True;
# 如果条目是或指向任何其他类型的文件,或者如果它不再存在,则返回False。
# 如果follow_symlinks是False,则只有在此条目是目录(没有符号链接)时返回True;
# 如果条目是任何其他类型的文件或者如果它不再存在,则返回False。
# entry.is_file(follow_symlinks=True)
# 如果此条目是指向文件的文件或符号链接,则返回True;
# 如果条目是或指向目录或其他非文件条目,或者如果它不再存在,则返回False。
# 如果follow_symlinks是False,则只有在此条目是文件(没有符号链接)时返回True;
# 如果条目是目录或其他非文件条目,或者如果它不再存在,则返回False。
# entry.is_symlink()
# 如果此条目是符号链接(即使已损坏),则返回True;
# 如果条目指向目录或任何类型的文件,或者如果它不再存在,则返回False。
# entry.name # 条目的基本文件名
# entry.path # 条目的完整路径名
# entry.stat() # 获取目录或文件的状态描述器。
os.mkdir(path[, mode]) # 创建一个目录
os.makedirs(path[, mode]) # 递归文件夹创建函数
os.remove(file_path) # 移除文件,不能删除目录
os.rmdir(dir_path) # 删除path指定的空目录,不能删除非空目录
os.removedirs(dir_path) # 递归删除空目录,注意:当使用windows打开相应的目录时,删除结果可能不一样
os.rename(src, dst) # 重命名文件或目录
os.renames(old, new) # 递归重命名文件或目录
os.replace(src, dst) # 重命名文件或目录
os.unlink(file_path) # 移除文件
os.stat(path) # 返回path文件的文件信息。返回文件的信息:
# st_mode - 文件信息的掩码,包含了文件的权限信息,文件的类型信息(是普通文件还是管道文件,或者是其他的文件类型)
# st_ino - 文件的i-node值
# st_dev - 设备信息
# st_nlink - 硬连接数
# st_uid - 用户ID
# st_gid - 组ID
# st_size - 文件大小,以byte为单位
# st_atime - 文件最近的访问时间
# st_mtime - 文件最近的修改时间
# st_ctime - 文件状态信息的修改时间(不是文件内容的修改时间)
os.utime(path,times) # 修改文件的访问时间和修改时间。
# 如果times参数为None,则设置文件的访问时间和修改时间为当前的时间。
# 否则,如果times参数不为空,则times参数是一个二元组(atime, mtime),用于设置文件的访问时间st_atime和修改时间st_mtime。
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
# 以自顶向下遍历目录树或者以自底向上遍历目录树,对每一个目录都返回一个三元组(dirpath, dirnames, filenames)。
# 三元组(dirpath,dirnames,filenames):
dirpath : 遍历所在目录树的位置,是一个字符串对象
dirnames : 目录树中的子目录组成的列表,不包括("."和"..")
filenames : 目录树中的文件组成的列表
# 如果可选参数topdown = True或者没有指定,则起始目录的三元组先于其子目录的三元组生成(自顶向下生成三元组);
# 如果topdown = False,则起始目录的三元组在其子目录的三元组生成后才生成(自底向上生成三元组)。
# 当topdown = True,os.walk()函数会就地修改三元组中的dirnames列表(可能是使用del或者进行切片),然后再使用os.walk()递归地处理剩余在dirnames列表中的目录。这种方式有助于加快搜索效率,可以指定特殊的遍历顺序。当topdown = False的时候修改dirnames是无效的,因为在使用自底向上进行遍历的时候子目录的三元组是先于上一级目录的三元组创建的。
# 默认情况下,调用listdir()返回的错误会被忽略,如果可选参数onerror被指定,则onerror必须是一个函数,该函数有一个OSError实例的参数,这样可以允许在运行的时候即使出现错误的时候不会打断os.walk()的执行,或者抛出一个异常并终止os.walk()的运行。
# 默认情况下,os.walk()遍历的时候不会进入符号链接,如果设置了可选参数followlinks = True,则可以进入符号链接。
# 注意:当设置followlinks = True时,可能会出现循环遍历,因为符号链接可能会出现自己链接自己的情况,而os.walk()不会意识到这一点。
# 注意:如果传递过去的路径名是一个相对路径,则不会修改当前的工作路径。
使用os模块的示例:
>>> os.environ
environ({'ALLUSERSPROFILE': 'C:\\ProgramData', 'ANDROID': 'D:\\Program Files\\ADB\\adb', 'COMMON
PROGRAMFILES': 'C:\\Program Files\\Common Files', 'COMMONPROGRAMFILES(X86)': 'C:\\Program Files (x86)\\Common Files', 'COMMONPROGRAMW6432': 'C:\\Program Files\\
Common Files', 'COMSPEC': 'C:\\Windows\\system32\\cmd.exe', 'FP_NO_HOST_CHECK': 'NO', 'HOMEDRIVE': 'C:', 'NUMBER_OF_PROCESSORS': '4', 'OS': 'Windows_NT', 'PATH': 'C:\\Windows\\system32;C:\\Windows', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC', 'PROCESSOR_ARCHITECTURE': 'AMD64', 'PROCESSOR_IDENTIFIER': 'Intel64
Family 6 Model 42 Stepping 7, GenuineIntel', 'PROCESSOR_LEVEL': '6', 'SYSTEMDRIVE': 'C:', 'SYSTEMROOT': 'C:\\Windows'})
>>> os.getenv('SYSTEMROOT')
'C:\\Windows'
>>> os.getlogin()
'meizhaohui'
>>> os.getpid() # python.exe的进程id
6524
>>> os.getppid() # cmd.exe的进程id
6120
>>> os.getcwd()
'D:\\'
>>> os.getcwdb()
b'D:\\'
>>> os.name
'nt'
>>> os.curdir
'.'
>>> os.pardir
'..'
>>> os.cpu_count()
4
>>> os.sep
'\\'
>>> os.altsep
'/'
>>> os.linesep
'\r\n'
>>> os.extsep
'.'
>>> os.path
<module 'ntpath' from 'D:\\ProgramFiles\\Python3.6.2\\lib\\ntpath.py'>
>>> os.pathsep
';'
>>> os.defpath
'.;C:\\bin'
>>> os.devnull
'nul'
>>> os.get_terminal_size()
os.terminal_size(columns=145, lines=40)
>>> os.get_exec_path() # 返回在启动进程时将搜索命名可执行文件(类似于shell)的目录列表
['D:\\Program Files (x86)\\python3.6.2\\Scripts', 'D:\\Program Files (x86)\\python3.6.2\\', 'C:\\Windows\\system32', 'C:\\Windows', 'C:\\WINDOWS\\system32', 'C:\\WINDOWS', 'C:\\WINDOWS\\System32\\Wbem', 'C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\', 'D:\\Program Files\\Git\\cmd', 'D:\\Program Files (x86)\\Pandoc\\', 'D:\\mei_softs\\jdk_8u172\\jre\\bin', 'C:\\WINDOWS\\System32\\OpenSSH\\', 'D:\\Softs\\adb1.0.32\\adb', '']
>>> import signal
>>> os.kill(1388,signal.SIGABRT) # 退出某进程
>>> os.kill(5948,signal.SIGILL) # 杀掉某进程
>>> os.chdir('tmp')
>>> os.getcwd()
'D:\\tmp'
>>> os.listdir()
['dir1', 'dir1_symlink', 'sys.txt', 'test1.txt', 'test2.txt', 'test3.txt']
>>> os.mkdir('dir2')
>>> os.listdir()
['dir1', 'dir1_symlink', 'dir2', 'sys.txt', 'test1.txt', 'test2.txt', 'test3.txt']
>>> os.makedirs('dir3/dir3_1')
>>> os.listdir()
['dir1', 'dir1_symlink', 'dir2', 'dir3', 'sys.txt', 'test1.txt', 'test2.txt', 'test3.txt']
>>> os.makedirs('dir4/dir4_1/dir4_11')
>>> os.remove('dir5/test5.txt')
>>> os.rmdir('dir5')
>>> os.rmdir('dir5/dir5_1')
>>> os.makedirs('dir5/dir5_2/dir5_2_1')
>>> os.removedirs('dir5/dir5_2/dir5_2_1')
>>> os.listdir()
['dir1', 'dir1_symlink', 'dir2', 'dir3', 'dir4', 'dir5', 'sys.txt', 'test1.txt', 'test2.txt', 'test3.txt']
>>> os.rename('test3.txt','test33.txt')
>>> os.listdir()
['dir1', 'dir1_symlink', 'dir2', 'dir3', 'dir4', 'dir5', 'sys.txt', 'test1.txt', 'test2.txt', 'test33.txt']
>>> os.renames('dir5/dir5_2/dir5_2_1','dir5/dir52/dir521')
>>> os.rename('dir4/dir4_1','dir4/dir41')
>>> os.replace('dir4/dir41','dir4/dir441')
>>> os.unlink('test33.txt')
>>> os.unlink('dir3/dir3_1/test3_1.txt')
>>> os.stat('test1.txt')
os.stat_result(st_mode=33206, st_ino=2814749767125765, st_dev=120385, st_nlink=1, st_uid=0, st_gid=0, st_size=39, st_atime=1, st_mtime=3, st_ctime=1513519788)
>>> os.utime('test1.txt')
>>> os.stat('test1.txt')
os.stat_result(st_mode=33206, st_ino=2814749767125765, st_dev=120385, st_nlink=1, st_uid=0, st_gid=0, st_size=39, st_atime=1514211306, st_mtime=1514211306, st_ctime=1513519788)
使用os模块创建软硬链接¶
可以使用os.link创建硬链接,os.symlink创建软链接。
- os.link(src, dst) # 创建源地址src的硬链接目标地址dst
- os.symlink(src, dst) # 创建地址src的软链接目标地址dst
src:这是源路径,原来存在的路径
dst:这是原来不存在的目标路径
如果python报”OSError: symbolic link privilege not held”错误,说明权限不足,可以使用”以管理员身份运行”cmd窗口,再打开python尝试创建软硬链接。
创建软硬链接示例:
>>> import os
>>> os.getcwd()
'D:\\tmp'
>>> os.listdir()
['a.txt', 'data.csv', 'dir1', 'dir2']
>>> os.link('a.txt','a.hard')
>>> os.listdir()
['a.hard', 'a.txt', 'data.csv', 'dir1', 'dir2']
>>> os.symlink('a.txt','a.soft')
>>> os.listdir()
['a.hard', 'a.soft', 'a.txt', 'data.csv', 'dir1', 'dir2']
windows cmd命令中使用MKLINK可以创建软硬链接,具体命令如下:
C:\>mklink
创建符号链接。
MKLINK [[/D] | [/H] | [/J]] Link Target
/D 创建目录符号链接。默认为文件
符号链接。
/H 创建硬链接而非符号链接。
/J 创建目录联接。
Link 指定新的符号链接名称。
Target 指定新链接引用的路径(相对或绝对)。
使用os.scandir()迭代器文件目录下文件目录信息¶
使用os.scandir(path)迭代器获取path目录下的文件或目录,并打印相关属性:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
# @Time : 2018/6/30 20:48
# @Author : 梅朝辉(meizhaohui)
# @Email : mzh.whut@gmail.com
# @Filename : subdirs.py
# @Description : 使用迭代器获取path目录下的文件或目录,并打印相关属性
# @Software : PyCharm
# @Python Version: python3.6.2
"""
def subdirs(path):
"""使用迭代器获取path目录下的文件或目录,并打印相关属性"""
import os
for entry in os.scandir(path):
if not entry.name.startswith('.'):
print("name:", entry.name)
print("path:", entry.path)
print("is_file:", entry.is_file(follow_symlinks=True))
print("is_dir:", entry.is_dir(follow_symlinks=False))
print("is_symlink:", entry.is_symlink())
print("stat:", entry.stat())
print("="*50, '\n')
if __name__ == '__main__':
subdirs('D:\\tmp')
运行结果如下:
"D:\Program Files (x86)\python3.6.2\python.exe" D:/data/python_scripts/subdirs.py
name: a.hard
path: D:\tmp\a.hard
is_file: True
is_dir: False
is_symlink: False
stat: os.stat_result(st_mode=33206, st_ino=0, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=4, st_atime=1530364793, st_mtime=1530366353, st_ctime=1530363449)
==================================================
name: a.soft
path: D:\tmp\a.soft
is_file: True
is_dir: False
is_symlink: True
stat: os.stat_result(st_mode=33206, st_ino=562949953471395, st_dev=2661556261, st_nlink=3, st_uid=0, st_gid=0, st_size=4, st_atime=1530364793, st_mtime=1530366353, st_ctime=1530363449)
==================================================
name: a.txt
path: D:\tmp\a.txt
is_file: True
is_dir: False
is_symlink: False
stat: os.stat_result(st_mode=33206, st_ino=0, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=4, st_atime=1530364793, st_mtime=1530366353, st_ctime=1530363449)
==================================================
name: data.csv
path: D:\tmp\data.csv
is_file: True
is_dir: False
is_symlink: False
stat: os.stat_result(st_mode=33206, st_ino=0, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=67, st_atime=1520171736, st_mtime=1520178110, st_ctime=1520171736)
==================================================
name: dir1
path: D:\tmp\dir1
is_file: False
is_dir: True
is_symlink: False
stat: os.stat_result(st_mode=16895, st_ino=0, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=0, st_atime=1530363233, st_mtime=1530363233, st_ctime=1530363233)
==================================================
name: dir2
path: D:\tmp\dir2
is_file: False
is_dir: True
is_symlink: False
stat: os.stat_result(st_mode=16895, st_ino=0, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=0, st_atime=1530363286, st_mtime=1530363286, st_ctime=1530363286)
==================================================
使用os.walk()遍历目录树¶
- 递归遍历目录树,生成目录树下所有文件的路径信息
walkdir.py代码如下:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
# @Time : 2018/6/30 22:03
# @Author : 梅朝辉(meizhaohui)
# @Email : mzh.whut@gmail.com
# @Filename : walkdir.py
# @Description : 递归遍历目录树,打印出所有文件路径
# @Software : PyCharm
# @Python Version: python3.6.2
"""
def walkdir(path):
import os
for root, dirs, files in os.walk(path, followlinks=False):
for name in files:
print(os.path.join(root, name))
for name in dirs:
print(os.path.join(root, name))
if __name__ == '__main__':
walkdir("D:\\tmp")
运行后,输出结果如下:
"D:\Program Files (x86)\python3.6.2\python.exe" D:/data/python_scripts/walkdir.py
D:\tmp\a.hard
D:\tmp\a.soft
D:\tmp\a.txt
D:\tmp\data.csv
D:\tmp\dir1
D:\tmp\dir2
D:\tmp\dir1\1.txt
D:\tmp\dir2\2.txt
D:\tmp\dir2\dir22
D:\tmp\dir2\dir22\22.txt
D:\tmp\dir2\dir22\dir222
D:\tmp\dir2\dir22\dir222\222.txt
进程已结束,退出代码0
- 删除整个目录的文件和文件夹
使用os.walk递归获取文件夹下的文件或文件夹信息,从最底层(也就是最内层)开始向最顶层操作,先删除底层文件,里面文件夹空了后,才能删除空的文件夹。
removeOneDir.py代码如下:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
"""
# @Time : 2018/6/30 22:13
# @Author : 梅朝辉(meizhaohui)
# @Email : mzh.whut@gmail.com
# @Filename : removeOneDir.py
# @Description : 删除整个目录的文件和文件夹
# @Software : PyCharm
# @Python Version: python3.6.2
"""
def remove_one_dir(top_path):
# 删除顶层目录top_path下的所有文件
import os
if not os.path.exists(top_path):
print(top_path, 'not exists')
return
if not os.path.isdir(top_path):
print(top_path, 'not a dirpath')
return
# 删除文件夹时,先删除里层文件,使文件夹为空,再删除文件夹
for dir_path, dirs, files in os.walk(top_path, topdown=False, followlinks=False):
print('the first for dir_path:{} dirs:{} files:{}'.format(dir_path, dirs, files))
for file in files:
file_path = os.path.join(dir_path, file)
print("delete file:", file_path)
os.remove(file_path)
print("delete folder:", dir_path)
os.rmdir(dir_path)
print(top_path, "have been deleted successfully!")
if __name__ == '__main__':
remove_one_dir("D:\\tmp")
运行后,输出结果如下:
"D:\Program Files (x86)\python3.6.2\python.exe" D:/data/python_scripts/removeOneDir.py
the first for dir_path:D:\tmp\dir1 dirs:[] files:['1.txt']
delete file: D:\tmp\dir1\1.txt
delete folder: D:\tmp\dir1
the first for dir_path:D:\tmp\dir2\dir22\dir222 dirs:[] files:['222.txt']
delete file: D:\tmp\dir2\dir22\dir222\222.txt
delete folder: D:\tmp\dir2\dir22\dir222
the first for dir_path:D:\tmp\dir2\dir22 dirs:['dir222'] files:['22.txt']
delete file: D:\tmp\dir2\dir22\22.txt
delete folder: D:\tmp\dir2\dir22
the first for dir_path:D:\tmp\dir2 dirs:['dir22'] files:['2.txt']
delete file: D:\tmp\dir2\2.txt
delete folder: D:\tmp\dir2
the first for dir_path:D:\tmp dirs:['dir1', 'dir2'] files:['a.hard', 'a.soft', 'a.txt', 'data.csv']
delete file: D:\tmp\a.hard
delete file: D:\tmp\a.soft
delete file: D:\tmp\a.txt
delete file: D:\tmp\data.csv
delete folder: D:\tmp
D:\tmp have been deleted successfully!
进程已结束,退出代码0
os.path模块操作路径¶
os.path模块主要处理文件路径、文件属性相关的事务。
- os.path模块的方法
使用tab键查看os.path的方法:
C:\Users>python
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os.path
>>> os.path.
os.path.abspath( os.path.getctime( os.path.realpath(
os.path.altsep os.path.getmtime( os.path.relpath(
os.path.basename( os.path.getsize( os.path.samefile(
os.path.commonpath( os.path.isabs( os.path.sameopenfile(
os.path.commonprefix( os.path.isdir( os.path.samestat(
os.path.curdir os.path.isfile( os.path.sep
os.path.defpath os.path.islink( os.path.split(
os.path.devnull os.path.ismount( os.path.splitdrive(
os.path.dirname( os.path.join( os.path.splitext(
os.path.exists( os.path.lexists( os.path.splitunc(
os.path.expanduser( os.path.normcase( os.path.stat
os.path.expandvars( os.path.normpath( os.path.supports_unicode_filenames
os.path.extsep os.path.os os.path.sys
os.path.genericpath os.path.pardir
os.path.getatime( os.path.pathsep
>>> os.path.
os.path模块常用方法:
os.path.abspath(path) # path的绝对路径(即完整路径)
os.path.basename(path) # 返回path路径的基名,即:
# 如果path是文件夹则返回最后一级的文件夹名称;
# 如果path是文件,则返回文件名称。
os.path.dirname(path) # 返回path路径的目录名
os.path.commonpath(paths) # 返回路径系列paths最长的公共子路径sub-path
os.path.commonprefix(paths) # 返回路径系列paths的公共前缀
os.path.exists(path) # 路径是否存在
os.path.isabs(path) # 路径是否是绝对路径
os.path.isdir(path) # 路径是否是目录
os.path.isfile(path) # 路径是否是文件
os.path.islink(path) # 路径是否是软链接(symbolic link)
os.path.ismount(path) # 路径是否是挂载点
os.path.join(path,*paths) # 将一个或多个路径合并成一个完整的路径
os.path.split(path) # 将路径分割,返回由其目录名和基名给成的元组
os.path.splitext(path) # 分割文件名,返回由文件名和扩展名组成的元组
os.path.realpath(path) # 返回指定文件的标准路径(absolute path),而非软链接所在的路径
os.path.getatime(filename) # 返回文件最后一次的访问时间,从1970年1月1日已经经过多少秒
os.path.getctime(filename) # 返回文件最后一次的改变时间,从1970年1月1日已经经过多少秒
os.path.getmtime(filename) # 返回文件最后一次的修改时间,从1970年1月1日已经经过多少秒
os.path.getsize(filename) # 返回文件的大小
os.path模块的示例:
>>> os.path.abspath('dir3')
'D:\\tmp\\dir3'
>>> os.path.abspath('3.txt')
'D:\\tmp\\3.txt'
>>> os.path.basename('dir3')
'dir3'
>>> os.path.dirname('dir3')
''
>>> os.path.dirname('D:\\tmp\\dir3')
'D:\\tmp'
>>> os.path.basename('D:\\tmp\\dir3')
'dir3'
>>> os.path.commonpath(['D:\\tmp\\dir1','D:\\tmp\\dir2','D:\\tmp\\dir3','D:\\tmp\\test1.txt'])
'D:\\tmp'
>>> os.path.commonpath(['D:\\tmp\\dir1','D:\\tmp\\dir2','D:\\tmp\\dir3','D:\\test1.txt'])
'D:\\'
>>> os.path.commonprefix(['D:\\tmp\\dir1','D:\\tmp\\dir2','D:\\tmp\\dir3'])
'D:\\tmp\\dir'
>>> os.path.commonprefix(['D:\\tmp\\dir1','D:\\tmp\\dir2','D:\\tmp\\dir3','D:\\tmp\\test1.txt'])
'D:\\tmp\\'
>>> os.path.exists('dir2')
True
>>> os.path.exists('D:\\tmp\\dir2')
True
>>> os.path.isabs('D:\\tmp\\dir2')
True
>>> os.path.isabs('dir2')
False
>>> os.path.isdir('dir2')
True
>>> os.path.isdir('test2.txt')
False
>>> os.path.isfile('test2.txt')
True
>>> os.path.isfile('dir2')
False
>>> os.symlink('test2.txt','test2_symlink.txt')
>>> os.path.islink('test2_symlink.txt')
True
>>> os.path.islink('test2.txt')
False
>>> os.path.ismount('/boot')
True
>>> os.path.ismount('/')
True
>>> os.path.ismount('/tmp')
False
>>> os.path.join('D:\\','tmp\\dir2')
'D:\\tmp\\dir2'
>>> os.path.split("D:\\tmp\\test1.txt")
('D:\\tmp', 'test1.txt')
>>> os.path.split("D:\\tmp")
('D:\\', 'tmp')
>>> os.path.splitext("D:\\tmp\\test1.txt")
('D:\\tmp\\test1', '.txt')
>>> os.path.splitext("test1.txt")
('test1', '.txt')
>>> os.path.realpath('test2.txt')
'D:\\tmp\\test2.txt'
>>> os.getcwd()
'D:\\tmp'
>>> os.path.realpath('test2.txt')
'D:\\tmp\\test2.txt'
>>> os.path.relpath("D:\\")
'..'
>>> os.path.relpath("D:\\tmp")
'.'
>>> os.path.relpath("D:\\tmp\\test2.txt")
'test2.txt'
>>> os.path.relpath("D:\\tmp\\dir2\\22.txt")
'dir2\\22.txt'
>>> os.path.getatime('test2.txt')
1514796590.3527365
>>> os.path.getctime('test2.txt')
1514796590.3527365
>>> os.path.getmtime('test2.txt')
1515160582.7379878
>>> os.path.getsize('test2.txt')
5
>>> import time
>>> time.ctime(os.path.getatime('test2.txt'))
'Mon Jan 1 16:49:50 2018'
>>> time.ctime(os.path.getctime('test2.txt'))
'Mon Jan 1 16:49:50 2018'
>>> time.ctime(os.path.getmtime('test2.txt'))
'Fri Jan 5 21:56:22 2018'
python字符串处理¶
目录
python字符串¶
- Python3中字符串是Unicode字符串而不是数组,这是与Python2相比最大的区别。
- Python2中需要区分普通的以字节为单位的字符串以及Unicode字符串。
- Python标准文本编码格式是UTF-8,这种编码方式简单快速,字符覆盖面广,出错率低。
- UTF-8动态编码方案:
- 为ASCII字符分配1字节;
- 为拉丁语系(除西里尔语)的语言分配2字节;
- 为其他的位于基本多语言平面的字符分配3字节;
- 为剩下的字符集分配4字节,这包括一些亚洲语言及符号。
- 如果你知道某个字符的Unicode ID,可以直接在Python字符串中引用这个ID获取对应字符。
- 可以使用N{name}来引用某一字符,其中name为该字符的标准名称,在 Unicode字符名称索引页 可以查到字符对应的标准名称。
- Python中的unicodedata模块提供了下面两个方向的转换函数:
- lookup() 接受不区分大小写的标准名称,返回一个Unicode字符。
- name() 接受一个Unicode字符,返回大写形式的名称。
从官网截取的部分字符标准名称对照表:
Unicode® Character Name Index
A
Name, Alias, or Category Chart Link
A WITH ACUTE, LATIN CAPITAL LETTER 00C1
A WITH ACUTE, LATIN SMALL LETTER 00E1
A WITH BREVE, LATIN SMALL LETTER 0103
A WITH CARON, LATIN SMALL LETTER 01CE
A WITH CIRCUMFLEX, LATIN CAPITAL LETTER 00C2
A WITH CIRCUMFLEX, LATIN SMALL LETTER 00E2
A WITH DIAERESIS, LATIN CAPITAL LETTER 00C4
A WITH DIAERESIS, LATIN SMALL LETTER 00E4
A WITH DOT ABOVE, LATIN SMALL LETTER 0227
A WITH DOT BELOW, LATIN SMALL LETTER 1EA1
A WITH DOUBLE GRAVE, LATIN SMALL LETTER 0201
A WITH GRAVE, LATIN CAPITAL LETTER 00C0
说明: 为了方便查阅,Unicode字符名称索引页列出的字符名称是经过修改的,因此与由unicodedata.name()得到的名称有所不同,如果需要将它们转换为真实的Unicode名称(Python使用的),只需要将逗号舍去,并将逗号后面的内容移动到最前面即可。
unicodedata模块属性或方法:
In [1]: import unicodedata
In [2]: unicodedata.
bidirectional() decomposition() mirrored() UCD
category() digit() name() ucd_3_2_0
combining() east_asian_width() normalize() ucnhash_CAPI
decimal() lookup() numeric() unidata_version
In [3]: unicodedata.lookup('A WITH ACUTE, LATIN CAPITAL LETTER')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-11-1bf6d86503ae> in <module>
----> 1 unicodedata.lookup('A WITH ACUTE, LATIN CAPITAL LETTER')
KeyError: "undefined character name 'A WITH ACUTE, LATIN CAPITAL LETTER'"
In [4]: unicodedata.lookup('LATIN CAPITAL LETTER A WITH ACUTE')
Out[4]: 'Á'
unicodedata模块的使用,check_unicode函数接受一个Unicode字符,查找它们对应的名称,再用这个名称查找对应的Unicode字符:
In [1]: import unicodedata
In [2]: def check_unicode(value):
...: name = unicodedata.name(value) # 查找字符的名称
...: value2 = unicodedata.lookup(name) # 查找名称对应的Unicode字符
...: print('value="{}",name="{}",value2="{}"'.format(value, name, value2)
...: )
...:
In [3]: check_unicode('A') # 纯ASCII字符
value="A",name="LATIN CAPITAL LETTER A",value2="A"
In [4]: check_unicode('$') # ASCII标点符号
value="$",name="DOLLAR SIGN",value2="$"
In [5]: check_unicode('\u00a2') # Unicode货币字符
value="¢",name="CENT SIGN",value2="¢"
In [6]: check_unicode('\u20ac') # 欧元符号
value="€",name="EURO SIGN",value2="€"
In [7]: check_unicode('\uffe5') # 中国货币人民币元
value="¥",name="FULLWIDTH YEN SIGN",value2="¥"
In [8]: check_unicode('\u2630') # 特殊符号
value="☰",name="TRIGRAM FOR HEAVEN",value2="☰"
In [9]: check_unicode('\u2603') # SNOWMAN字符
value="☃",name="SNOWMAN",value2="☃"
In [10]: check_unicode('\u00e9') # 拉丁字母é
value="é",name="LATIN SMALL LETTER E WITH ACUTE",value2="é"
python编码encode和解码decode¶
- 编码是将字符串转化为一系列字节的过程。
- 解码是将字节序列转化为Unicode字符串的过程。
python字符串处理的常用方法¶
python字符串有以下方法:
>>> str='string'
>>> str.
str.capitalize( str.endswith( str.index( str.isidentifier( str.istitle( str.lstrip( str.rindex( str.split( str.title(
str.casefold( str.expandtabs( str.isalnum( str.islower( str.isupper( str.maketrans( str.rjust( str.splitlines( str.translate(
str.center( str.find( str.isalpha( str.isnumeric( str.join( str.partition( str.rpartition( str.startswith( str.upper(
str.count( str.format( str.isdecimal( str.isprintable( str.ljust( str.replace( str.rsplit( str.strip( str.zfill(
str.encode( str.format_map( str.isdigit( str.isspace( str.lower( str.rfind( str.rstrip( str.swapcase(
可以总结为以下几种类:
- 大小写转换类
- 判断是否类
- 两端填充类
- 索引计数类
- 字符截取与拼接类
- 字符替换类
- 字符查找类
- 翻译类
- 格式化类
- 编码类
大小写转换类¶
大小写转换的方法如下:
str.capitalize() 首字符大写,其他字符小写;原字符串并不会改变,生成新的字符串序列
>>> str1='abcdef'
>>> str1.capitalize()
'Abcdef'
>>> str1='abCdE'
>>> str1.capitalize()
'Abcde'
>>> str1
'abCdE'
str.title() 标题化,首字母大写,其他字符小写
>>> str1='abCdE'
>>> str1.title()
'Abcde'
>>> str2='2sadDddE'
>>> str2.title()
'2Sadddde'
str.upper() 将字符串转换为全部大写形式
>>> str1.upper()
'ABCDE'
str.lower() 将字符串转换为全部小写形式,汉语 & 英语环境下使用str.lower()没有问题
>>> str1.lower()
'abcde'
str.casefold() 将字符串转换为全部小写形式,可以处理其他语言(如,德语)小写转化
德语中'ß'的小写是'ss'
>>> str1.casefold()
'one'
>>> str2.casefold()
'2two'
>>> s = 'ß'
>>> s
'ß'
>>> s.lower()
'ß'
>>> s.casefold()
'ss'
str.swapcase() 字符串大小写翻转,大写变成小写,小写变成大写
>>> str2='2Two'
>>> str1='One'
>>> str1
'One'
>>> str2
'2Two'
>>> str1.swapcase()
'oNE'
>>> str2.swapcase()
'2tWO'
判断是否类¶
判断是否的方法如下:
str.startswith(string) 判断是否以某指定字符串string开头
>>> str1='One'
>>> str1
'One'
>>> str1.startswith('o')
False
>>> str1.startswith('O')
True
>>> str1.startswith('On')
True
str.endswith(string) 判断是否以某指定字符串string结尾
>>> str1='One'
>>> str1
'One'
>>> str1.endswith('e')
True
>>> str1.endswith('ne')
True
>>> str1.endswith('One')
True
>>> str1.endswith('one')
False
str.isidentifier() 判断是否为有效标识符(有效标识符第一个字符串应该是字母或下划线,不能是数字或特殊符号)
>>> str1
'One'
>>> str2
'2Two'
>>> str3
'123'
>>> str1.isidentifier()
True
>>> str2.isidentifier()
False
>>> str3.isidentifier()
False
>>> str4='_ab'
>>> str4.isidentifier()
True
>>> str5='&adg'
>>> str5.isidentifier()
False
str.istitle() 判断是否为标题化的字符串(即第一个字母需要为大写)
>>> str1
'One'
>>> str2
'2Two'
>>> str3
'123'
>>> str4
'&adg'
>>> str5
'abcd'
>>> str1.istitle()
True
>>> str2.istitle()
True
>>> str3.istitle()
False
>>> str4.istitle()
False
>>> str5.istitle()
False
str.isalnum() 判断是否为字母或数字
>>> str1
'One'
>>> str2
'2Two'
>>> str3
'123'
>>> str4
'&adg'
>>> str5
'abcd'
>>> str1.isalnum()
True
>>> str2.isalnum()
True
>>> str3.isalnum()
True
>>> str4.isalnum()
False
>>> str5.isalnum()
True
str.islower() 判断是否为小写字母
>>> str1
'One'
>>> str2
'2Two'
>>> str3
'123'
>>> str4
'&adg'
>>> str5
'abcd'
>>> str1.islower()
False
>>> str2.islower()
False
>>> str3.islower()
False
>>> str4.islower()
True
>>> str5.islower()
True
str.isupper() 判断是否为大写字母
>>> str1='abcde'
>>> str2='ABCDE'
>>> str3='1$abc'
>>> str4='1$ABC'
>>> str1.isupper()
False
>>> str2.isupper()
True
>>> str3.isupper()
False
>>> str4.isupper()
True
str.isnumeric() 判断是否为数字系列,不带小数点
>>> str1='123.456'
>>> str2='123456'
>>> str1.isnumeric()
False
>>> str2.isnumeric()
True
str.isdecimal() 判断是否为数字系列,不带小数点
>>> str1
'123.456'
>>> str2
'123456'
>>> str1.isdecimal()
False
>>> str2.isdecimal()
True
str.isdigit() 判断是否为数字系列,不带小数点
>>> str1
'123.456'
>>> str2
'123456'
>>> str1.isdigit()
False
>>> str2.isdigit()
True
str.isspace() 判断所有字符是否为whitespace,即空格或tab键
>>> str1
'123.456'
>>> str2
'123456'
>>> str1.isspace()
False
>>> str2.isspace()
False
>>> strspace=' '
>>> strspace.isspace()
True
>>> strtab=' '
>>> strtab.isspace()
True
str.isprintable() 是否可打印。tab键不可打印,返回False
>>> str1
'123.456'
>>> str2
'123456'
>>> str1.isprintable()
True
>>> str2.isprintable()
True
>>> strspace=' '
>>> strspace.isprintable()
True
>>> strtab=' '
>>> strtab.isprintable()
False
str.isalpha() 是否为字母
>>> str1
'123.456'
>>> str2
'123456'
>>> str1.isalpha()
False
>>> str2.isalpha()
False
>>> strspace=' '
>>> strspace.isalpha()
False
>>> str4='abcd'
>>> str5='ABCD'
>>> str6='abcd32'
>>> str4.isalpha()
True
>>> str5.isalpha()
True
>>> str6.isalpha()
False
两端填充类¶
两端填充的方法如下:
str.rjust(width[, fillchar]) 右对齐,左侧填充字符,使新生成的字符串长度为width
若不指定fillchar字符,则默认在左侧填充空格,fillchar为单字符
>>> str1
'123.456'
>>> str2
'123456'
>>> str3
'III'
>>> str1.rjust(7)
'123.456'
>>> str1.rjust(8)
' 123.456'
>>> str1.rjust(9)
' 123.456'
>>> str1.rjust(9,'*')
'**123.456'
>>> str3.rjust(6)
' III'
>>> str3.rjust(6,'*')
'***III'
>>> str3.rjust(7,'*')
'****III'
str.ljust(width[, fillchar]) 左对齐,右侧填充字符,使新生成的字符串长度为width
若不指定fillchar字符,则默认在右侧填充空格,fillchar为单字符
>>> str1
'123.456'
>>> str2
'123456'
>>> str3
'III'
>>> str1.ljust(7)
'123.456'
>>> str1.ljust(8)
'123.456 '
>>> str1.ljust(9)
'123.456 '
>>> str1.ljust(9,'*')
'123.456**'
>>> str3.ljust(6)
'III '
>>> str3.ljust(6,'*')
'III***'
>>> str3.ljust(7,'*')
'III****'
str.center(width[, fillchar]) 以当前字符串str为中心,在两侧填充字符,使新生成的字符串长度为width
若不指定fillchar字符,则默认在两侧填充空格,fillchar为单字符
>>> str1.center(8)
'123.456 '
>>> str1.center(9)
' 123.456 '
>>> str2.center(9)
' 123456 '
>>> str2.center(8)
' 123456 '
>>> str3.center(6,'*')
'*III**'
>>> str3.center(7,'*')
'**III**'
>>> str3.center(7,'*&')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: The fill character must be exactly one character long
>>> str3.center(7,'&')
'&&III&&'
>>> str3.center(8,'&')
'&&III&&&'
>>> str3.center(9,'&')
'&&&III&&&'
>>> str3.center(10,'&')
'&&&III&&&&'
str.zfill(width) 将字符串str左侧填充0,使字符串长度为width
>>> c1='abcde'
>>> c1.zfill(5)
'abcde'
>>> c1.zfill(6)
'0abcde'
>>> c1.zfill(7)
'00abcde'
>>> c1.zfill(8)
'000abcde'
>>> c1.zfill(9)
'0000abcde'
>>> c2='abc ed'
>>> c2.zfill(10)
'0000abc ed'
索引计数类¶
索引计数的方法如下:
str.index(sub[, start[, end]]) 计算子字符串sub在str中的lowest最低索引号
若指定索引start和end时,则在索引start至end(不包括索引end)间进行查找
>>> c1
'1122333'
>>> c2
'ababcabab'
>>> c3
'AAAA'
>>> c1.index('1')
0
>>> c1.index('1',2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> c1.index('1',1)
1
>>> c2.index('ab')
0
>>> c2.index('ab',2)
2
>>> c2.index('ab',2,3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
>>> c2.index('ab',2,4)
2
>>> c3.index('A')
0
>>> c3.index('A',1,4)
1
>>> c3.index('A',2,4)
2
>>> c3.index('A',3,4)
3
>>> c3.index('A',4,4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
str.rindex(sub[, start[, end]]) 计算子字符串sub在str中的highest最高索引号
>>> c3
'AAAA'
>>> c3.rindex('A')
3
>>> c3.rindex('A',0,3)
2
>>> c3.rindex('A',0,-1)
2
>>> c3.rindex('A',0,2)
1
>>> c3.rindex('A',0,1)
0
>>> c3.rindex('A',0,0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: substring not found
str.count(sub[, start[, end]]) 计算子字符串sub在str中出现的次数
若指定索引start和end时,则在索引start至end(不包括索引end)间进行计数统计
>>> c1='1122333'
>>> c2='ababcabab'
>>> c3='AAAA'
>>> c1
'1122333'
>>> c2
'ababcabab'
>>> c3
'AAAA'
>>> c1.count('1')
2
>>> c1.count('2')
2
>>> c1.count('3')
3
>>> c2.count('a')
4
>>> c2.count('b')
4
>>> c2.count('c')
1
>>> c3.count('A')
4
>>> c1.count('1',1)
1
>>> c1.count('1',0,0)
0
>>> c1.count('1',0,1)
1
>>> c1.count('1',0,2)
2
>>> c1.count('2',0,2)
0
>>> c2.count('ab')
4
>>> c2.count('abc')
1
字符截取与拼接类¶
字符截取与拼接的方法如下:
str.partition(sep) 从左向右开始匹配进行切割,以sep作为分隔符,返回 (head, sep, tail),返回一个3元的元组,第一个为分隔符左边的子串,第二个为分隔符本身,第三个为分隔符右边的子串。
如果查找不到sep,则返回(str,'','')
str.rpartition(sep) 从右向左开始匹配进行切割,以sep作为分隔符,返回 (head, sep, tail);
如果查找不到sep,则返回('','',str)
>>> c1='abcdcba'
>>> c1.partition('a') # 第1个字符就是a,所以head=''
('', 'a', 'bcdcba')
>>> c1.rpartition('a') # 从右向左匹配,第1个字符就是a,所以tail=''
('abcdcb', 'a', '')
>>> c1.partition('b')
('a', 'b', 'cdcba')
>>> c1.rpartition('b')
('abcdc', 'b', 'a')
>>> c1.partition('c')
('ab', 'c', 'dcba')
>>> c1.rpartition('c')
('abcd', 'c', 'ba')
>>> c1.partition('x') # 查找不到字符x,返回两个空的''
('abcdcba', '', '')
>>> c1.rpartition('x')
('', '', 'abcdcba')
str.join(seq) 使用字符串str将可迭代序列seq连接起来形成一个新的字符串
>>> str = "-";
>>> seq = ("a", "b", "c");
>>> str.join(seq)
'a-b-c'
>>> c3
'AAAA'
>>> "_".join(c3)
'A_A_A_A'
>>> str1='###'
>>> c3
'AAAA'
>>> str1.join(c3)
'A###A###A###A'
>>> str2="_*_"
>>> str2
'_*_'
>>> str2.join(c3)
'A_*_A_*_A_*_A'
>>> ';'.join(str1)
'#;#;#'
# 将列表中每个元素使用_下划线连接起来
>>> li = ['alex','eric','rain']
>>> li
['alex', 'eric', 'rain']
>>> '_'.join(li)
'alex_eric_rain'
str.strip([chars]) 移除字符串str两端的字符(默认是whitespace,空格或tab键)
如果指定字符串chars,则移除字符串str两端带有chars含有的字符的所有字符
# 定义5个字符串
>>> str1=' abc '
>>> str1
' abc '
>>> str2='\t abc \t'
>>> str2
'\t abc \t'
>>> str3='\t abc \t'
>>> str3
'\t abc \t'
>>> str4='000abcde000'
>>> str4
'000abcde000'
>>> str5='000 abc 000'
>>> str5
'000 abc 000'
# 以默认方式移除两端字符
>>> str1.strip()
'abc'
>>> str2.strip()
'abc'
>>> str3.strip()
'abc'
>>> str4.strip()
'000abcde000'
>>> str5.strip()
'000 abc 000'
# 指定chars为字符'0',仅移除两端的字符'0'
>>> str1.strip('0')
' abc '
>>> str2.strip('0')
'\t abc \t'
>>> str3.strip('0')
'\t abc \t'
>>> str4.strip('0')
'abcde'
>>> str5.strip('0')
' abc '
# 指定chars为字符'0'和' '空格,
# 需要移除两端的字符'0'和空格,但此时的'\t'tab键不会被移除
>>> str1.strip('0 ')
'abc'
>>> str2.strip('0 ')
'\t abc \t'
>>> str3.strip('0 ')
'\t abc \t'
>>> str4.strip('0 ')
'abcde'
>>> str5.strip('0 ')
'abc'
# 指定chars为字符'0'和' '空格以及'\t'tab键,
# 需要移除两端的字符'0'和空格,且'\t'tab键也会被移除
>>> str1.strip('0 \t')
'abc'
>>> str2.strip('0 \t')
'abc'
>>> str3.strip('0 \t')
'abc'
>>> str4.strip('0 \t')
'abcde'
>>> str5.strip('0 \t')
'abc'
str.lstrip([chars]) 移除左侧的字符串,规格与str.strip()类似,但仅移除左侧的字符串
>>> str1.lstrip()
'abc '
>>> str2.lstrip()
'abc \t'
>>> str3.lstrip()
'abc \t'
>>> str4.lstrip()
'000abcde000'
>>> str5.lstrip()
'000 abc 000'
>>> str1.lstrip('0')
' abc '
>>> str2.lstrip('0')
'\t abc \t'
>>> str3.lstrip('0')
'\t abc \t'
>>> str4.lstrip('0')
'abcde000'
>>> str5.lstrip('0')
' abc 000'
>>> str1.lstrip('0 ')
'abc '
>>> str2.lstrip('0 ')
'\t abc \t'
>>> str3.lstrip('0 ')
'\t abc \t'
>>> str4.lstrip('0 ')
'abcde000'
>>> str5.lstrip('0 ')
'abc 000'
str.rstrip([chars]) 移除右侧的字符串,规格与str.strip()类似,但仅移除右侧的字符串
str.split(sep=None, maxsplit=-1) 以分隔符sep对str字符串进行分隔,最多分隔maxsplit次
若不指定分隔符sep,则默认以whitespace(空格,换行\n,制表符\t)为分隔符;
若不指定最多分隔次数maxsplit,则全部分隔
>>> str1='0a\t b\tcb a0'
>>> str1
'0a\t b\tcb a0'
>>> str1.split()
['0a', 'b', 'cb', 'a0']
>>> str1.split(None,2)
['0a', 'b', 'cb a0']
>>> str1.split(None,1)
['0a', 'b\tcb a0']
>>> str1.split(None,0)
['0a\t b\tcb a0']
>>> str1.split(None,3)
['0a', 'b', 'cb', 'a0']
>>> str1.split('0')
['', 'a\t b\tcb a', '']
>>> str1.split('0',1)
['', 'a\t b\tcb a0']
>>> str1.split('0',2)
['', 'a\t b\tcb a', '']
>>> str1.split('a')
['0', '\t b\tcb ', '0']
>>> str1.split('b')
['0a\t ', '\tc', ' a0']
str.rsplit(sep=None, maxsplit=-1) 以分隔符sep对str字符串从结尾处进行分隔,最多分隔maxsplit次
若不指定分隔符sep,则默认以whitespace(空格,换行\n,制表符\t)为分隔符;
若不指定最多分隔次数maxsplit,则全部分隔
>>> str1.rsplit()
['0a', 'b', 'cb', 'a0']
>>> str1.rsplit('0')
['', 'a\t b\tcb a', '']
>>> str1.rsplit('0',1)
['0a\t b\tcb a', '']
>>> str1.rsplit('a',1)
['0a\t b\tcb ', '0']
>>> str1.split('a',1)
['0', '\t b\tcb a0']
>>> str1.split('b',1)
['0a\t ', '\tcb a0']
>>> str1.rsplit('b',1)
['0a\t b\tc', ' a0']
str.splitlines([keepends]) Python splitlines() 按照行('\r', '\r\n', \n')分隔,
返回一个包含各行作为元素的列表,
如果参数 keepends 为 False,不包含换行符;
如果为 True,则保留换行符。
>>> str2='a\n\rb\nc\rd\r\ne'
>>> str2
'a\n\rb\nc\rd\r\ne'
>>> str2.split()
['a', 'b', 'c', 'd', 'e']
>>> str2.splitlines()
['a', '', 'b', 'c', 'd', 'e']
>>> str2.splitlines(True)
['a\n', '\r', 'b\n', 'c\r', 'd\r\n', 'e']
字符替换类¶
字符替换的方法如下:
str.expandtabs(tabsize=8) 将tab键扩展为空格,若不指定tab大小,则默认以8个空格替换一个tab键
strtab = 'ab b'
strspace = strtab.expandtabs()
print(strspace)
ab c
print(strtab.expandtabs(tabsize=4))
ab c
str.replace(old, new[, count]) 字符串替换,以new字符串替换str中的old字符串
如果指定count值,则仅替换前面count个匹配值
>>> c1='abcdcbadcba'
>>> c1.replace('a','A')
'AbcdcbAdcbA'
>>> c1.replace('a','A',2)
'AbcdcbAdcba'
>>> c1.replace('a','A',1)
'Abcdcbadcba'
>>> c1.replace('a','A',0)
'abcdcbadcba'
>>> c1.replace('a','A',3)
'AbcdcbAdcbA'
字符查找类¶
字符查找的方法如下:
str.find(sub[, start[, end]]) 查找最低索引,从左侧开始查找
>>> str1='0123456543210'
>>> str1
'0123456543210'
>>> str1.find('0')
0
>>> str1.find('1')
1
>>> str1.find('2')
2
>>> str1.find('3')
3
>>> str1.find('3',1)
3
>>> str1.find('3',5)
9
>>> str1.find('3',5,6)
-1
>>> str1.find('34',5,-1)
-1
>>> str1.find('32',5,-1)
9
str.rfind(sub[, start[, end]]) 查找最高索引,从右侧开始查找
>>> str1.rfind('0')
12
>>> str1.rfind('1')
11
>>> str1.rfind('2')
10
>>> str1.rfind('3')
9
翻译类¶
翻译的方法如下:
str.translate(trantab) 使用翻译字典表trantab对字符串进行翻译
str.maketrans(intab,outtab)或str.maketrans(dicttab) 创建翻译字典键值对intab:outtab,或以某字段dicttab构建翻译字典表
>>> intab='aeiou'
>>> outtab='12345'
>>> trantab = str.maketrans(intab,outtab)
>>> trantab
{97: 49, 101: 50, 105: 51, 111: 52, 117: 53}
>>> str ="this is string example... wow!!"
>>> print(str.translate(trantab))
th3s 3s str3ng 2x1mpl2... w4w!!
>>> str2='abcdefabc'
>>> str2.translate(tr)
'\x01\x02\x03def\x01\x02\x03'
>>> dict1={'a':'1','b':2,'c':'3','d':'4'}
>>> ttab=str.maketrans(dict1)
>>> ttab
{97: '1', 98: 2, 99: '3', 100: '4'}
>>> str2.translate(ttab)
'1\x0234ef1\x023'
格式化类¶
格式化的方法如下:
str.format(*args, **kwargs) format方法被用于字符串的格式化输出
# 通过手动编号或自动编号输出数据
>>> print('{0}+{1}={2}'.format('A','B','C')) # 手动编号,将format中字符依次填入
A+B=C
>>> print('{}+{}={}'.format('A','B','C')) # 自动编号形式,按顺序将format的字段填充到相应的大括号{}对应处
A+B=C
>>> print('{1}+{0}={2}'.format('A','B','C')) # 手动编号,可改变format中字符的出现顺序
B+A=C
>>> print('{1}+{2}={0}'.format('A','B','C')) # 手动编号,可改变format中字符的出现顺序
B+C=A
# 手动编号与自动编号不能一起混用,否则会报错:
>>> print('{1}+{0}={}'.format('A','B','C'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: cannot switch from manual field specification to automatic field numbering
# 输出字符串,在对应位置填入对应的值
>>> print('{} love to learn {}'.format('I','Python'))
I love to learn Python
>>> print('{0} love to learn {1}'.format('I','Python'))
I love to learn Python
# 输出变量字符串的值
>>> str1='string'
>>> str1
'string'
>>> print('The length of {0} is {1}'.format(str1,len(str1)))
The length of string is 6
# 通过列表索引设置输出参数
>>> list1=['a','b','c']
>>> list1
['a', 'b', 'c']
>>> print('The string is {0[0]}+{0[1]}+{0[2]}'.format(list1))
The string is a+b+c
>>> print('The string is {0}{0}{0}'.format(list1))
The string is ['a', 'b', 'c']['a', 'b', 'c']['a', 'b', 'c']
# 通过字典设置输出参数
>>> dict1={'name':'Mei','lang':'Python'}
>>> dict1
{'name': 'Mei', 'lang': 'Python'}
>>> print('You name is {0[name]} and you love to learn {0[lang]}'.format(dict1))
You name is Mei and you love to learn Python
注:字典也可以通过以下关键字参数的方式传入
# 通过关键字参数作为传入参数,字典前加**
>>> dict1={'name':'Mei','lang':'Python'}
>>> dict1
{'name': 'Mei', 'lang': 'Python'}
>>> print('You name is {name} and you love to learn {lang}'.format(**dict1))
You name is Mei and you love to learn Python
# 通过关键字参数作为传入参数
>>> print('You name is {name} and you love to learn {lang}'.format(name='Mei',lang='Python'))
You name is Mei and you love to learn Python
# 字符填充与格式化
:[填充字符][对齐方式 <^>][宽度]
^, <, > 分别是居中、左对齐、右对齐(默认),后面带宽度, : 号后面带填充的字符,只能是一个字符,不指定则默认是用空格填充。
# 右对齐,长度为1,左侧填充空格
>>> print('{0:1}'.format(3))
3
# 右对齐,长度为2,左侧填充空格
>>> print('{0:2}'.format(3))
3
# 右对齐,长度为3,左侧填充空格
>>> print('{0:3}'.format(3))
3
>>> print('{0:#3}'.format(3))
3
# 右对齐,长度为3,左侧填充指定字符#
>>> print('{0:#>3}'.format(4))
##4
# 右对齐,长度为3,左侧填充指定字符@
>>> print('{0:@>3}'.format(4))
@@4
# 右对齐,长度为3,左侧填充指定字符!
>>> print('{0:!>3}'.format(4))
!!4
# 右对齐,长度为3,左侧填充指定字符0
>>> print('{0:0>3}'.format(4))
004
# 右对齐,长度为3,左侧填充指定字符%
>>> print('{0:%>3}'.format(4))
%%4
# 右对齐,长度为3,左侧填充指定字符*
>>> print('{0:*>3}'.format(4))
**4
# 右对齐,长度为6,左侧填充指定字符*
>>> print('{0:*>6}'.format(4))
*****4
# 左对齐,长度为6,右侧填充指定字符*
>>> print('{0:*<6}'.format(4))
4*****
# 居中对齐,长度为6,左面两侧填充指定字符*
>>> print('{0:*^6}'.format(4))
**4***
# 居中对齐,长度为7,左面两侧填充指定字符*
>>> print('{0:*^7}'.format(4))
***4***
# 数字格式化控制
>>> import math
>>> math.pi
3.141592653589793
>>> pi=math.pi
>>> pi
3.141592653589793
# 保留小数点后两位小数
>>> print('{:.2f}'.format(pi))
3.14
>>> print('{0:.2f}'.format(pi))
3.14
# 保留小数点后三位小数
>>> print('{:.3f}'.format(pi))
3.142
>>> print('{0:.3f}'.format(pi))
3.142
# 带符号保留小数点后三位小数
>>> print('{0:+.3f}'.format(-pi))
-3.142
>>> print('{0:+.3f}'.format(pi))
+3.142
# 输出整数
>>> print('{0:.0f}'.format(pi))
3
# 输出以逗号分隔的数字格式
>>> num=1234567890
>>> num
1234567890
>>> print('{0:,}'.format(num))
1,234,567,890
# 输出百分比的数字格式
>>> per = 0.6645
>>> print('{0:.2%}'.format(per))
66.45%
>>> print('{0:.1%}'.format(per))
66.5%
# 输出指数形式的数字格式
>>> bignum=pow(10,9)
>>> bignum
1000000000
>>> print('{0:.1e}'.format(bignum))
1.0e+09
>>> print('{0:.2e}'.format(bignum))
1.00e+09
# 进制转换
# b、d、o、x 分别是二进制(0b开头)、十进制、八进制(0o开头)、十六进制(0x或0X开头)
# 添加#井号后,输出字符会带相应的进制标识
>>> x=12
>>> print('{0:b}'.format(x))
1100
>>> print('{0:d}'.format(x))
12
>>> print('{0:o}'.format(x))
14
>>> print('{0:x}'.format(x))
c
>>> print('{0:X}'.format(x))
C
>>> print('{0:#o}'.format(x))
0o14
>>> print('{0:#b}'.format(x))
0b1100
>>> print('{0:#d}'.format(x))
12
>>> print('{0:#o}'.format(x))
0o14
>>> print('{0:#x}'.format(x))
0xc
>>> print('{0:#X}'.format(x))
0XC
# 输出大括号,使用大括号{}来转义大括号
>>> print("{0:#X}{{'abc'}}".format(x))
0XC{'abc'}
str.format_map(dict1) 通过dict字典关键字参数输出,这种方式比format形式运行速度快。
# 通过关键字参数作为传入参数,字典前加**
>>> dict1={'name':'Mei','lang':'Python'}
>>> dict1
{'name': 'Mei', 'lang': 'Python'}
>>> print('You name is {name} and you love to learn {lang}'.format(**dict1))
You name is Mei and you love to learn Python
>>> print('You name is {name} and you love to learn {lang}'.format_map(dict1))
You name is Mei and you love to learn Python
# 计算两种方式运行所用的时间
import timeit
dict1 = {'name': 'Mei', 'lang': 'Python'}
start = timeit.default_timer()
print('You name is {name} and you love to learn {lang}'.format(**dict1))
end1 = timeit.default_timer()
print('You name is {name} and you love to learn {lang}' _map(dict1))
end2 = timeit.default_timer()
print(str(end1-start))
print(str(end2-end1))
输出结果如下:
You name is Mei and you love to learn Python
You name is Mei and you love to learn Python
3.202066400183586e-05
1.0673554667278617e-05
编码类¶
- 编码是将字符串转化为一系列字节的过程。
- 解码是将字节序列转化为Unicode字符串的过程。
编码的方法如下:
str.encode(encoding='utf-8', errors='strict') 按某种encoding格式进行编码,返回一个字节流bytes对象
python3默认以utf-8对字符串进行编码,encode为编码,decode为解码。
>>> str1='我爱python'
>>> str1
'我爱python'
>>> str1.encode()
b'\xe6\x88\x91\xe7\x88\xb1python'
>>> str1.encode(encoding='utf-8')
b'\xe6\x88\x91\xe7\x88\xb1python'
>>> byte_code1 = str1.encode('utf-8')
>>> byte_code1
b'\xe6\x88\x91\xe7\x88\xb1python'
>>> byte_code1.decode('gb2312')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'gb2312' codec can't decode byte 0xe6 in position 0: illegal multibyte sequence
>>> byte_code1.decode('utf-8')
'我爱python'
>>> str2 = byte_code1.decode('utf-8')
>>> str2
'我爱python'
>>> str2.encode('gb2312')
b'\xce\xd2\xb0\xaepython'
>>> byte_code2 = str2.encode('gb2312')
>>> byte_code1
b'\xe6\x88\x91\xe7\x88\xb1python'
>>> byte_code2
b'\xce\xd2\xb0\xaepython'
>>> byte_code2 = str2.encode('gb2312')
>>> str3 = byte_code2.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 0: invalid continuation byte
>>> str3 = byte_code2.decode('gb2312')
>>> str3
'我爱python'
python string模块¶
- python string模块预定义了一些可供我们测试用的字符串常量。
string模块的方法或属性:
In [1]: import string
In [2]: string?
Type: module
String form: <module 'string' from '/usr/lib/python3.5/string.py'>
File: /usr/lib/python3.5/string.py
Docstring:
A collection of string constants.
Public module variables:
whitespace -- a string containing all ASCII whitespace
ascii_lowercase -- a string containing all ASCII lowercase letters
ascii_uppercase -- a string containing all ASCII uppercase letters
ascii_letters -- a string containing all ASCII letters
digits -- a string containing all ASCII decimal digits
hexdigits -- a string containing all ASCII hexadecimal digits
octdigits -- a string containing all ASCII octal digits
punctuation -- a string containing all ASCII punctuation characters
printable -- a string containing all ASCII characters considered printable
In [3]: string.
ascii_letters capwords hexdigits punctuation
ascii_lowercase digits octdigits Template
ascii_uppercase Formatter printable whitespace
string模块的使用:
In [1]: import string
In [2]: string.ascii_letters
Out[2]: 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
In [3]: string.ascii_lowercase
Out[3]: 'abcdefghijklmnopqrstuvwxyz'
In [4]: string.ascii_uppercase
Out[4]: 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
In [5]: string.capwords
Out[5]: <function string.capwords(s, sep=None)>
In [6]: string.capwords('word')
Out[6]: 'Word'
In [7]: string.digits
Out[7]: '0123456789'
In [8]: string.Formatter
Out[8]: string.Formatter
In [9]: string.hexdigits
Out[9]: '0123456789abcdefABCDEF'
In [10]: string.octdigits
Out[10]: '01234567'
In [11]: string.printable
Out[11]: '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
In [12]: string.punctuation
Out[12]: '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
In [13]: string.Template
Out[13]: string.Template
In [14]: string.whitespace
Out[14]: ' \t\n\r\x0b\x0c'
In [15]: s = string.Template('$who like $what')
In [16]: s.substitute(who='I',what='Python')
Out[16]: 'I like Python'
In [17]: s.safe_substitute(who='I')
Out[17]: 'I like $what'
模块-itertools模块迭代器函数¶
itertools模块基本介绍¶
- Python的内建模块itertools提供了非常有用的用于操作迭代对象的函数。
- itertools包含特殊用途的迭代器函数。
- 在for .. in 循环中调用迭代函数,每次返回一项,并记住当前调用的状态。
itertools的帮助信息:
In [1]: import itertools
In [2]: itertools?
Type: module
String form: <module 'itertools' (built-in)>
Docstring:
Functional tools for creating and using iterators.
Infinite iterators:
count(start=0, step=1) --> start, start+step, start+2*step, ...
cycle(p) --> p0, p1, ... plast, p0, p1, ...
repeat(elem [,n]) --> elem, elem, elem, ... endlessly or up to n times
Iterators terminating on the shortest input sequence:
accumulate(p[, func]) --> p0, p0+p1, p0+p1+p2
chain(p, q, ...) --> p0, p1, ... plast, q0, q1, ...
chain.from_iterable([p, q, ...]) --> p0, p1, ... plast, q0, q1, ...
compress(data, selectors) --> (d[0] if s[0]), (d[1] if s[1]), ...
dropwhile(pred, seq) --> seq[n], seq[n+1], starting when pred fails
groupby(iterable[, keyfunc]) --> sub-iterators grouped by value of keyfunc(v)
filterfalse(pred, seq) --> elements of seq where pred(elem) is False
islice(seq, [start,] stop [, step]) --> elements from
seq[start:stop:step]
starmap(fun, seq) --> fun(*seq[0]), fun(*seq[1]), ...
tee(it, n=2) --> (it1, it2 , ... itn) splits one iterator into n
takewhile(pred, seq) --> seq[0], seq[1], until pred fails
zip_longest(p, q, ...) --> (p[0], q[0]), (p[1], q[1]), ...
Combinatoric generators:
product(p, q, ... [repeat=1]) --> cartesian product
permutations(p[, r])
combinations(p, r)
combinations_with_replacement(p, r)
itertools模块无限迭代器¶
- itertools.count(start=0, step=1) 返回从start开始,步长为step的迭代器,如果不手动终止,会无限迭代。我们可以在循环中增加判断条件或按Ctrl+C终止程序。
示例:
In [3]: for i in itertools.count(1,2):
...: print(i)
...: if i>20:
...: break
...:
1
3
5
7
9
11
13
15
17
19
21
- itertools.cycle(p) 循环可迭代对象p中的子对象。
示例:
In [4]: count = 0
In [5]: for i in itertools.cycle('abcdefg'):
...: print(i)
...: count += 1
...: if count > 20:
...: break
...:
a
b
c
d
e
f
g
a
b
c
d
e
f
g
a
b
c
d
e
f
g
In [6]: count = 0
In [7]: for i in itertools.cycle(['one','two','three']):
...: print(i)
...: count += 1
...: if count > 20:
...: break
...:
one
two
three
one
two
three
one
two
three
one
two
three
one
two
three
one
two
three
one
two
three
- itertools.repeat(elem [,n]) 重复elem元素n次,如果不指定n则无限循环。
示例:
In [8]: count = 0
In [9]: for i in itertools.repeat(['one','two','three']):
...: print(i)
...: count += 1
...: if count > 20:
...: break
...:
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
['one', 'two', 'three']
In [10]: count = 0
In [11]: for i in itertools.repeat('abcdefg'):
...: print(i)
...: count += 1
...: if count > 20:
...: break
...:
...:
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
abcdefg
itertools模块输入序列迭代器¶
- itertools.accumulate(p[, func]) –> p0, p0+p1, p0+p1+p2 返回序列组合后的值。
- 可以指定函数func,则按func函数进行迭代。
示例:
In [12]: for i in itertools.accumulate(['a','b','c']):
...: print(i)
...:
a
ab
abc
In [13]: for i in itertools.accumulate('abc'):
...: print(i)
...:
a
ab
abc
In [14]: for i in itertools.accumulate(('a','b','c')):
...: print(i)
...:
a
ab
abc
In [15]: for i in itertools.accumulate(('a','b','c'), lambda x,y:x*2+y*3):
...: print(i)
...:
a
aabbb # ==> 'a' * 2 + 'b' * 3
aabbbaabbbccc # ==> 'aabbb' * 2 + 'c' * 3
In [16]: for i in itertools.accumulate((1,2,3,4), lambda x,y:x*2+y*2):
...: print(i)
...:
1
6 # ==> 1 * 2 + 2 * 2
18 # ==> 6 * 2 + 3 * 2
44 # ==> 18 * 2 + 4 * 2
In [17]: import operator
In [18]: for i in itertools.accumulate((1,2,3,4,5,6), operator.mul):
...: print(i)
...:
1 # ==> 计算阶乘
2 # ==> 1 * 2
6 # ==> 2 * 3
24 # ==> 6 * 4
120 # ==> 24 * 5
720 # ==> 120 * 6
- itertools.chain(p, q, …) –> p0, p1, … plast, q0, q1, … 将多个迭代器作为参数,将这多个迭代器链接在一起,返回单个迭代器。
示例:
In [19]: for i in itertools.chain('abcd',('a1','b1','c1','d1'),{'a2':1,'b2':2}):
...: print(i)
...:
a
b
c
d
a1
b1
c1
d1
a2
b2
- itertools.chain.form_iterable([p, q, …]) –> p0, p1, … plast, q0, q1, … 只接受一个参数,并将参数作为迭代器进行迭代。
示例:
In [20]: for i in itertools.chain.from_iterable(['abc']):
...: print(i)
...:
a
b
c
In [21]: for i in itertools.chain.from_iterable(['abc','def']):
...: print(i)
...:
a
b
c
d
e
f
In [22]: for i in itertools.chain.from_iterable(('abc','def')):
...: print(i)
...:
a
b
c
d
e
f
In [23]: for i in itertools.chain.from_iterable({'abc':1,'def':2}):
...: print(i)
...:
a
b
c
d
e
f
- itertools.compress(data, selectors) –> (d[0] if s[0]), (d[1] if s[1]), … compress(‘ABCDEF’, [1,0,1,0,1,1]) –> A C E F
- 选择器,对原始数据data进行筛选,选择器selectors中元素为False时不选择,True时选择。
示例:
In [24]: for i in itertools.compress('abcdef',[True,[],1,(),2,{'a':1}]):
...: print(i)
...:
a
c
e
f
In [25]: for i in itertools.compress('ABCDEF', [1,0,1,0,1,1]):
...: print(i)
...:
A
C
E
F
In [26]: for i in itertools.compress('ABCDEF', (1,'0',1,0,1,1)):
...: print(i)
...:
A
B
C
E
F
- itertools.dropwhile(pred, seq) –> seq[n], seq[n+1], starting when pred fails dropwhile(lambda x: x<5, [1,4,6,4,1]) –> 6 4 1
- 删除满足条件的元素,直到条件pred为False时,返回后续所有元素的迭代器。
示例:
In [27]: for i in itertools.dropwhile(lambda x: x<5, [1,4,6,4,1]):
...: print(i)
...:
6
4
1
In [28]: def should_drop(x):
...: print('Droped: {}'.format(x))
...: return x<5
...:
In [28]: for i in itertools.dropwhile(should_drop, [1,4,6,4,1]):
...: print(i)
...:
Droped: 1
Droped: 4
Droped: 6
6
4
1
- itertools.groupby(iterable[, keyfunc]) –> sub-iterators grouped by value of keyfunc(v)
- 返回一个按照keyfunc(v)进行分组后的值集合的子迭代器。如果iterable在多次连接迭代中产生同一项,则会定义一个组。keyfunc是计算的关键,如果未指定keyfunc则返回值与迭代元素值相同。如果定义了keyfunc,则需要对每个迭代元素执行keyfunc后的结果进行分组处理(每个分组是一个子迭代器),最后返回的迭代器的元素是(key,group),如果要显示最后的group值,需要使用list(group)将组迭代器存储在字典中。
示例:
In [29]: for k,v in itertools.groupby('AAABBBCCCDDEAAA'):
...: print(k, '\tvalue:', v)
...:
A value: <itertools._grouper object at 0x7feff3ab82e8>
B value: <itertools._grouper object at 0x7feff3ab8ac8>
C value: <itertools._grouper object at 0x7feff3ab82e8>
D value: <itertools._grouper object at 0x7feff3ab8cf8>
E value: <itertools._grouper object at 0x7feff3ab82e8>
A value: <itertools._grouper object at 0x7feff3ab8cf8>
# 说明:此处直接打印value,可以看出value是一个迭代器
In [30]: for k,v in itertools.groupby('AAABBBCCCDDEAAA'):
...: print(k, '\tvalue:', list(v))
...:
A value: ['A', 'A', 'A']
B value: ['B', 'B', 'B']
C value: ['C', 'C', 'C']
D value: ['D', 'D']
E value: ['E']
A value: ['A', 'A', 'A']
# 说明:将value存储到list中,打印出list列表中的值
In [31]: def keyfunc(key):
...: return key + '*' + key
...:
In [32]: for k,v in itertools.groupby('AAABBBCCCDDEAAA', keyfunc):
...: print(k, '\tvalue:', list(v))
...:
A*A value: ['A', 'A', 'A']
B*B value: ['B', 'B', 'B']
C*C value: ['C', 'C', 'C']
D*D value: ['D', 'D']
E*E value: ['E']
A*A value: ['A', 'A', 'A']
# 说明:定义了keyfunc,重新生成的健不一样
In [33]: for k,v in itertools.groupby(['aa','ab','abc','def','abcde'], len):
...: print(k, '\tvalue:', list(v))
...:
2 value: ['aa', 'ab']
3 value: ['abc', 'def']
5 value: ['abcde']
# 说明:使用len函数获取元素的长度值作为健
In [34]: def keyfunc(key):
...: import random
...: return key + '*' + key + str(random.randint(0,100))
...:
In [35]: for k,v in itertools.groupby('AAABBBCCCDDEAAA', keyfunc):
...: print(k, '\tvalue:', list(v))
...:
A*A79 value: ['A']
A*A95 value: ['A']
A*A70 value: ['A']
B*B21 value: ['B']
B*B61 value: ['B']
B*B99 value: ['B']
C*C99 value: ['C']
C*C28 value: ['C']
C*C85 value: ['C']
D*D96 value: ['D']
D*D90 value: ['D']
E*E5 value: ['E']
A*A87 value: ['A']
A*A25 value: ['A']
A*A50 value: ['A']
# 说明:此处用了一个随机数放在迭代元素的后面,并没有产生相同的键,因此没有分组。返回的结果都不一样
- itertools.filterfalse(pred, seq) –> elements of seq where pred(elem) is False, filterfalse(lambda x: x%2, range(10)) –> 0 2 4 6 8
- 仅生成pred(elem)为False的项的迭代器
示例:
In [36]: for i in itertools.filterfalse(lambda x: x%2, range(10)):
...: print(i)
...:
0
2
4
6
8
# 说明:返回求余值是0(即False)的数,也就是返回偶数
In [37]: def predicate(x):
...: return len(x) > 2
...:
In [38]: for i in itertools.filterfalse(predicate, ['a','ab','abc','abcd']):
...: print(i)
...:
a
ab
# 说明:返回长度不大于2的元素。
- itertools.islice(seq, [start,] stop [, step]) –> elements from seq[start:stop:step]
- 返回序列seq的从start开始到stop结束的步长为step的元素的迭代器,如果不指定start和step,则第二个参数是stop。
示例:
In [39]: for i in itertools.islice('ABCDEFG', 2):
...: print(i)
...:
A
B
In [40]: for i in itertools.islice('ABCDEFG', 2, 4):
...: print(i)
...:
C
D
In [41]: for i in itertools.islice('ABCDEFG', 2, None):
...: print(i)
...:
C
D
E
F
G
In [42]: for i in itertools.islice('ABCDEFG', 0, None, 2):
...: print(i)
...:
A
C
E
G
- itertools.starmap(fun, seq) –> fun(*seq[0]), fun(*seq[1]), itertools.starmap(pow, [(2,5), (3,2), (10,3)]) –> 32 9 1000
- 返回执行fun(elem)后的迭代器。
示例:
In [43]: for i in itertools.starmap(pow,[(2,5),(3,2),(10,3)]):
...: print(i)
...:
32
9
1000
In [44]: for i in itertools.starmap(lambda x:2*x, ('1','2','3','4')):
...: print(i)
...:
11
22
33
44
- itertools.tee(it, n=2) –> (it1, it2 , … itn) splits one iterator into n
- 返回基于原始输入的n个独立迭代器的元组。为了克隆原始迭代器,生成的项会被缓存,分割成n个独立迭代器后,原先的迭代器就不要再使用,否则缓存机制可能无法正确工作。
- 使用list()函数比tee()函数快。
示例:
In [45]: x = itertools.tee(('a','ab','abc'), 3)
In [46]: x
Out[46]:
(<itertools._tee at 0x7feff83a5448>,
<itertools._tee at 0x7feff397ee08>,
<itertools._tee at 0x7feff3ad8dc8>)
In [47]: for i in x:
...: print(list(i))
...:
['a', 'ab', 'abc']
['a', 'ab', 'abc']
['a', 'ab', 'abc']
- itertools.takewhile(pred, seq) –> seq[0], seq[1], until pred fails
- itertools.takewhile(lambda x: x<5, [1,4,6,4,1]) –> 1 4
- 保留序列元素直到条件不满足。与dropwhile相反。
示例:
In [48]: for i in itertools.takewhile(lambda x: x<5, [1,4,6,4,1]):
...: print(i)
...:
1
4
- itertools.zip_longest(p, q, …) –> (p[0], q[0]), (p[1], q[1]), …
- 创建一个聚合来自每个迭代的元素的迭代器。 如果迭代的长度不均匀,则使用fillvalue填充缺失值。 迭代继续,直到最长的可迭代用尽。
示例:
In [49]: for i in itertools.zip_longest('ABCD','xy',fillvalue='-'):
...: print(i)
...:
('A', 'x')
('B', 'y')
('C', '-')
('D', '-')
In [50]: for i in itertools.zip_longest('ABCD','xy',['a','b','c','d','e'],fillva
...: lue='*'*3):
...: print(i)
...:
('A', 'x', 'a')
('B', 'y', 'b')
('C', '***', 'c')
('D', '***', 'd')
('***', '***', 'e')
# 说明:最长的元素是列表['a','b','c','d','e'],其他元素长度小于5的元素,会补充3个*星号
itertools模块组合迭代器¶
- itertools.product(p, q, … [repeat=1]) –> cartesian product
- 生成笛卡尔积的元组。
示例:
In [51]: for i in itertools.product('AB','xy'):
...: print(i)
...:
('A', 'x')
('A', 'y')
('B', 'x')
('B', 'y')
In [52]: for i in itertools.product('AB','xy',repeat=2):
...: print(i)
...:
('A', 'x', 'A', 'x')
('A', 'x', 'A', 'y')
('A', 'x', 'B', 'x')
('A', 'x', 'B', 'y')
('A', 'y', 'A', 'x')
('A', 'y', 'A', 'y')
('A', 'y', 'B', 'x')
('A', 'y', 'B', 'y')
('B', 'x', 'A', 'x')
('B', 'x', 'A', 'y')
('B', 'x', 'B', 'x')
('B', 'x', 'B', 'y')
('B', 'y', 'A', 'x')
('B', 'y', 'A', 'y')
('B', 'y', 'B', 'x')
('B', 'y', 'B', 'y')
In [53]: for i in itertools.product('AB','xy',repeat=3):
...: print(i)
...:
('A', 'x', 'A', 'x', 'A', 'x')
('A', 'x', 'A', 'x', 'A', 'y')
('A', 'x', 'A', 'x', 'B', 'x')
('A', 'x', 'A', 'x', 'B', 'y')
('A', 'x', 'A', 'y', 'A', 'x')
('A', 'x', 'A', 'y', 'A', 'y')
('A', 'x', 'A', 'y', 'B', 'x')
('A', 'x', 'A', 'y', 'B', 'y')
('A', 'x', 'B', 'x', 'A', 'x')
('A', 'x', 'B', 'x', 'A', 'y')
('A', 'x', 'B', 'x', 'B', 'x')
('A', 'x', 'B', 'x', 'B', 'y')
('A', 'x', 'B', 'y', 'A', 'x')
('A', 'x', 'B', 'y', 'A', 'y')
('A', 'x', 'B', 'y', 'B', 'x')
('A', 'x', 'B', 'y', 'B', 'y')
('A', 'y', 'A', 'x', 'A', 'x')
('A', 'y', 'A', 'x', 'A', 'y')
('A', 'y', 'A', 'x', 'B', 'x')
('A', 'y', 'A', 'x', 'B', 'y')
('A', 'y', 'A', 'y', 'A', 'x')
('A', 'y', 'A', 'y', 'A', 'y')
('A', 'y', 'A', 'y', 'B', 'x')
('A', 'y', 'A', 'y', 'B', 'y')
('A', 'y', 'B', 'x', 'A', 'x')
('A', 'y', 'B', 'x', 'A', 'y')
('A', 'y', 'B', 'x', 'B', 'x')
('A', 'y', 'B', 'x', 'B', 'y')
('A', 'y', 'B', 'y', 'A', 'x')
('A', 'y', 'B', 'y', 'A', 'y')
('A', 'y', 'B', 'y', 'B', 'x')
('A', 'y', 'B', 'y', 'B', 'y')
('B', 'x', 'A', 'x', 'A', 'x')
('B', 'x', 'A', 'x', 'A', 'y')
('B', 'x', 'A', 'x', 'B', 'x')
('B', 'x', 'A', 'x', 'B', 'y')
('B', 'x', 'A', 'y', 'A', 'x')
('B', 'x', 'A', 'y', 'A', 'y')
('B', 'x', 'A', 'y', 'B', 'x')
('B', 'x', 'A', 'y', 'B', 'y')
('B', 'x', 'B', 'x', 'A', 'x')
('B', 'x', 'B', 'x', 'A', 'y')
('B', 'x', 'B', 'x', 'B', 'x')
('B', 'x', 'B', 'x', 'B', 'y')
('B', 'x', 'B', 'y', 'A', 'x')
('B', 'x', 'B', 'y', 'A', 'y')
('B', 'x', 'B', 'y', 'B', 'x')
('B', 'x', 'B', 'y', 'B', 'y')
('B', 'y', 'A', 'x', 'A', 'x')
('B', 'y', 'A', 'x', 'A', 'y')
('B', 'y', 'A', 'x', 'B', 'x')
('B', 'y', 'A', 'x', 'B', 'y')
('B', 'y', 'A', 'y', 'A', 'x')
('B', 'y', 'A', 'y', 'A', 'y')
('B', 'y', 'A', 'y', 'B', 'x')
('B', 'y', 'A', 'y', 'B', 'y')
('B', 'y', 'B', 'x', 'A', 'x')
('B', 'y', 'B', 'x', 'A', 'y')
('B', 'y', 'B', 'x', 'B', 'x')
('B', 'y', 'B', 'x', 'B', 'y')
('B', 'y', 'B', 'y', 'A', 'x')
('B', 'y', 'B', 'y', 'A', 'y')
('B', 'y', 'B', 'y', 'B', 'x')
('B', 'y', 'B', 'y', 'B', 'y')
- itertools.permutations(p[, r])
- 创建一个迭代器,返回iterable中所有长度为r的项目序列,如果省略了r,那么序列的长度与iterable中的项目数量相同: 返回p中任意取r个元素做排列的元组的迭代器。
示例:
In [54]: for i in itertools.permutations('ABCD',r=2):
...: print(i)
...:
('A', 'B')
('A', 'C')
('A', 'D')
('B', 'A')
('B', 'C')
('B', 'D')
('C', 'A')
('C', 'B')
('C', 'D')
('D', 'A')
('D', 'B')
('D', 'C')
In [55]: for i in itertools.permutations('ABC'):
...: print(list(i))
...:
['A', 'B', 'C']
['A', 'C', 'B']
['B', 'A', 'C']
['B', 'C', 'A']
['C', 'A', 'B']
['C', 'B', 'A']
- itertools.combinations(p, r)
- 创建一个迭代器,返回iterable中所有长度为r的子序列,返回的子序列中的项按输入iterable中的顺序排序 (不带重复)。
示例:
In [56]: for i in itertools.combinations('ABC',r=2):
...: print(i)
...:
('A', 'B')
('A', 'C')
('B', 'C')
In [57]: for i in itertools.combinations('ABC',r=1):
...: print(i)
...:
('A',)
('B',)
('C',)
In [58]: for i in itertools.combinations('ABC',r=3):
...: print(i)
...:
('A', 'B', 'C')
- itertools.combinations_with_replacement(p, r)
- 创建一个迭代器,返回iterable中所有长度为r的子序列,返回的子序列中的项按输入iterable中的顺序排序 (带重复)
示例:
In [59]: for i in itertools.combinations_with_replacement('ABC',r=3):
...: print(i)
...:
('A', 'A', 'A')
('A', 'A', 'B')
('A', 'A', 'C')
('A', 'B', 'B')
('A', 'B', 'C')
('A', 'C', 'C')
('B', 'B', 'B')
('B', 'B', 'C')
('B', 'C', 'C')
('C', 'C', 'C')
In [60]: for i in itertools.combinations_with_replacement('ABC',r=2):
...: print(i)
...:
('A', 'A')
('A', 'B')
('A', 'C')
('B', 'B')
('B', 'C')
('C', 'C')
In [61]: for i in itertools.combinations_with_replacement('ABC',r=1):
...: print(i)
...:
('A',)
('B',)
('C',)
参考文献
文件的读写¶
文件读写内置方法open¶
- 使用内置方法
<built-in function open>
对文件进行读写。
常用方法如下:
>>> file_object=open('sys.txt','r')
>>> file_object.
file_object.buffer file_object.encoding file_object.isatty( file_object.newlines file_object.readlines( file_object.truncate(
file_object.close( file_object.errors file_object.line_buffering file_object.read( file_object.seek( file_object.writable(
file_object.closed file_object.fileno( file_object.mode file_object.readable( file_object.seekable( file_object.write(
file_object.detach( file_object.flush( file_object.name file_object.readline( file_object.tell( file_object.writelines(
>>> file_object
<_io.TextIOWrapper name='sys.txt' mode='r' encoding='cp936'>
>>> file_object.buffer # 文件缓存
<_io.BufferedReader name='sys.txt'>
>>> file_object.encoding # 文件的编码
'cp936'
>>> file_object.errors # 读取文件错误时的报告级别(strict严格,ignore忽略,replace替换)
'strict'
>>> file_object.mode # 读取文件的模式
'r'
>>> file_object.name # 文件的名称
'sys.txt'
>>> file_object.readable() # 文件对象是否可读
True
>>> file_object.writable() # 文件对象是否可写
False
>>> file_object.seekable() # 文件是否支持随机访问,如果为False,则seek(), tell()和truncate()会报错
True
>>> file_object.isatty() # 如果文件连接(与终端设备相关联)至tty(类似的)设备
False # isatty()方法返回True,否则返回False。
>>> file_object.close() # 关闭文件对象
>>> file_object.closed # 文件是否关闭
True
>>> file_object.fileno() # 返回一个整型的文件描述符(file descriptor FD 整型)
4 # 可用于底层操作系统的 I/O 操作。
>>> file_object=open('sys.txt','r',1)
>>> file_object.buffer
<_io.BufferedReader name='sys.txt'>
>>> file_object.line_buffering # 文件对象是否为以单行作为缓存
True
>>> file_object.tell() # 当前文件指针在文件中位置,从文件起始算起,单位为字节
0
>>> file_object.seek(3,1) # 从当前位置向后偏移3个字节报错!
# 原因:
# 在文本文件中,没有使用b模式选项打开的文件,只允许从文件头开始计算相对位置
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: can't do nonzero cur-relative seeks
>>> del file_object # 将file_object对象删除
>>> file_object=open('sys.txt','rb') # 重新打开文件,以二进制方式打开
>>> file_object
<_io.BufferedReader name='sys.txt'>
>>> file_object.readline() # 读取一行
b'a1\r\n'
>>> file_object.tell() # 当前文件指针在文件中位置,从文件起始算起,单位为字节
4
>>> file_object.seek(2,1) # 从当前位置向后偏移2个字节
6
>>> file_object.tell() # 获取当前位置
6
>>> file_object.read() # 从当前位置开始,读取文件数据,一直到文件结尾
b'2\r\nabc3\r\nabcd4'
>>> file_object.seek(-2,2) # 从文件结尾向前偏移2个字节
18
>>> file_object.tell() # 获取当前位置(注:windows上的换行符\r\n算做两个字节)
18
>>> file_object.read() # 从当前位置开始,读取文件数据,一直到文件结尾
b'd4'
>>> file_object.seek(0,0) # 从文件开头向后偏移0个字节,即返回到文件开头
0
>>> file_object.tell() # 获取当前位置
0
>>> file_object.readlines() # 读取文件所有内容至列表中
[b'a1\r\n', b'ab2\r\n', b'abc3\r\n', b'abcd4']
>>> file_object.tell() # 当前位置已经到达文件结尾
20
>>> file_object.seek(-19,2) # 从文件结尾向前偏移19个字节
1
>>> file_object.tell() # 获取当前位置
1
>>> file_object.readline() # 读取当前行中剩余字符
b'1\r\n'
>>> file_object.tell() # 获取当前位置
4
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.read(1) # 读取1个字节
b'a'
>>> file_object.read(2) # 读取2个字节
b'1\r'
>>> file_object.read(3) # 读取3个字节
b'\nab'
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readline(2) # 读取当前行当前位置后2个字节
b'2'
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.tell() # 获取当前位置
0
>>> file_object.readlines(2) # 读取2个字节的行的内容
[b'a1\r\n']
>>> file_object.tell() # 获取当前位置
4
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readlines(3) # 读取3个字节的行的内容
[b'a1\r\n']
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.tell() # 获取当前位置
0
>>> file_object.readlines(5) # 读取5个字节的行的内容,也就是两行内容
[b'a1\r\n', b'ab2\r\n']
>>> file_object.tell() # 获取当前位置
9
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readlines(6) # 读取6个字节的行的内容,也就是两行内容
[b'a1\r\n', b'ab2\r\n']
>>> file_object.tell() # 获取当前位置
9
>>> file_object.detach() # 将底层缓冲区与TextIOBase分离并返回
<_io.FileIO name='sys.txt' mode='rb' closefd=True>
>>> file_object.seek(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: raw stream has been detached
>>> file_object=open('sys.txt','ab+') # 以二进制追加的形式读取文件
>>> string1=b'\r\nabcde5' # 创建二进制字符串string1
>>> string1
b'\r\nabcde5'
>>> file_object.write(string1) # 将二进制字符串string1写入到文件对象中
8
>>> file_object.flush() # 刷新缓存区,将数据写入到文件里
>>> file_object.tell() # 获取当前位置
28
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readlines() # 读取所有行的内容
[b'a1\r\n', b'ab2\r\n', b'abc3\r\n', b'abcd4\r\n', b'abcde5']
>>> list1=[b'abcdef6',b'abcdefg7']
>>> list1
[b'abcdef6', b'abcdefg7']
>>> file_object.writelines(list1) # 将二进制列表list1写入到文件对象中
>>> file_object.flush() # 刷新缓存区,将数据写入到文件里
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readlines() # 读取所有行的内容,由于list1中未加换行符,导致都追加到最后一行了
[b'a1\r\n', b'ab2\r\n', b'abc3\r\n', b'abcd4\r\n', b'abcde5abcdef6abcdefg7']
>>> file_object.seek(28,0) # 返回到追加之前的位置
28
>>> file_object.tell()
28
>>> file_object.read() # 查看是否到达正确的位置,后面的数据都是刚才追加的
b'abcdef6abcdefg7'
>>> file_object.tell()
43
>>> file_object.seek(0,0) # 返回到文件开头
0
>>> file_object.seek(28,0) # 返回到追加之前的位置
28
>>> file_object.truncate() # 从当前位置截断文件
28
>>> file_object.flush() # 刷新缓存区,将数据写入到文件里,也就是删除了刚才追加的数据
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readlines() # 读取所有行的内容
[b'a1\r\n', b'ab2\r\n', b'abc3\r\n', b'abcd4\r\n', b'abcde5']
>>> list1=[b'\r\nabcdef6',b'\r\nabcdefg7'] # 重新定义列表list1,添加换行符
>>> list1
[b'\r\nabcdef6', b'\r\nabcdefg7']
>>> file_object.readlines()
[]
>>> file_object.writelines(list1) # 将列表list1写入到文件对象中
>>> file_object.flush() # 刷新缓存区,将数据写入到文件里
>>> file_object.seek(0) # 返回到文件开头
0
>>> file_object.readlines() # 读取所有行的内容
[b'a1\r\n', b'ab2\r\n', b'abc3\r\n', b'abcd4\r\n', b'abcde5\r\n', b'abcdef6\r\n', b'abcdefg7']
>>> file_object.close() # 关闭文件对象
>>> file_object.closed # 判断文件对象是否关闭
True
注意: readlines()
读取所有行的内容至内存中,内存占用率过高; readline()
每次读取一行,对于大文件需要综合考虑做出取舍。
文件的读写模式¶
文件的读写模式:
"文件的读写"中已经讲解了当文件打开后,可以对文件进行的一些读写操作。本节讲解文件的读写模式。
使用open函数打开一个文件,并返回一个file文件对象。
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
Open file and return a stream. Raise IOError upon failure.
[打开文件并返回一个文件对象流,失败时则会引发IOError错误]
The available modes are:
[有效的模式有以下几种:]
========= ===============================================================
Character Meaning
--------- ---------------------------------------------------------------
'r' open for reading (default)
[以只读模式打开文件,文件指针位于文件开头,为默认模式,文件不存在时,并不会新建文件,不可写]
'w' open for writing, truncating the file first
[以只写模式打开文件,文件存在则清空文件内容(在打开时就被清空),不存在则创建(慎用),不可读]
'x' create a new file and open it for writing
[x模式与w模式类似,以只写模式打开文件,只是如果文件存在时会报FileExistsError错误,不可读]
'a' open for writing, appending to the end of the file if it exists
[以追加写模式打开文件,如果文件存在则在文件结尾开始追加写(不论当前指针位置在哪,都是在文件最后进行追加),不可读]
'b' binary mode
[二进制模式,返回的是二进制对象]
't' text mode (default)
[文本模式(默认以文本模式打开),返回的是字符串对象]
'+' open a disk file for updating (reading and writing)
[同时可读可写,不能单独使用,必须与rwax一起作用,文件存在与否不去考虑]
不同模式打开文件的列表:
r:以只读的方式打开文件,文件的指针将会放在文件的开头,为默认模式
rb:以二进制格式打开一个文件用于只读,文件指针会在文件的开头
r+:打开一个文件用于读写,文件指针将会在文件的开头(写入数据时,会将原始数据覆盖掉)
rb+:以二进制格式打开一个文件用于读写,文件指针会放在文件的开头
w:打开一个文件用于写入,如果该文件已存在则将会覆盖文件,如果不存在则创建新文件
wb:以二进制打开一个文件用于写入
w+:打开一个文件用于读写
wb+:以二进制格式打开一个文件用于读写,如果文件存在则覆盖,如果不存在则创建新文件
a:打开一个文件用于追加内容,如果文件已存在,文件指针会放在文件的结尾,如果不存在则创建新文件进行写入
ab:以二进制格式打开一个文件用于追加写入
a+:打开一个文件用于读写,如果该文件已存在,文件指针会放在结尾,文件打开时会是追加模式,该文件不存在则创建新文件(即使指针不在结尾,也会在结尾进行添加数据)
ab+:以二进制格式打开一个文件用于追加。
test1.txt文件内容如下:
abc
def
ghi
>>> file1=open('test1.txt')
>>> file1.readlines()
['abc\n', 'def\n', 'ghi']
>>> string1='jkl'
>>> file1.write(string1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not writable
>>> file1.writable()
False
>>> file1.mode
'r'
>>> file1.close()
根据打开的模式不同,open() 返回的文件对象类型也不同:
TextIOWrapper:文本模式,返回TextIOWrapper对象。
BufferedReader:读二进制,即rb,返回BufferedReader对象。
BufferedWriter:写和追加二进制,即wb、ab,返回BufferedWriter对象。
BufferedRandom:读/写二进制模式,即含有b+的模式,返回BufferedRandom对象。
>>> file1=open('test1.txt','r')
>>> type(file1)
<class '_io.TextIOWrapper'>
>>> file2=open('test2.txt','w')
>>> type(file2)
<class '_io.TextIOWrapper'>
>>> file3=open('test3.txt','a+')
>>> type(file3)
<class '_io.TextIOWrapper'>
>>> file4=open('test4.txt','rb')
>>> type(file4)
<class '_io.BufferedReader'>
>>> file5=open('test5.txt','wb')
>>> type(file5)
<class '_io.BufferedWriter'>
>>> file6=open('test6.txt','ab')
>>> type(file6)
<class '_io.BufferedWriter'>
>>> file7=open('test7.txt','ab+')
>>> type(file7)
<class '_io.BufferedRandom'>
>>> file8=open('test8.txt','xb+')
>>> type(file8)
<class '_io.BufferedRandom'>
with上下文管理器的使用¶
使用 with...open
方式打开文件:
# 使用with...open方式打开文件,不用考虑再去关闭文件
with open('D:\\test1.txt',mode='a+',encoding='utf-8') as file1:
print(file1)
print(file1.tell())
file1.seek(0)
for line in file1.readlines():
print(line)
对于一些特殊类型的文件,可以使用相应的模块进行读取。如 json
模块可以读取json文件, logging
模块读取日志文件, xml.etree.ElementTree
读取xml文件, csv
模块读取CSV文件, ConfigParser
模块读取配置文件,如果你需要两层以上的嵌套结构,建议使用json文件保存配置文件。
csv模块¶
- csv模块实现了以csv格式读取和写入表格数据的类。
- csv模块可以读取EXCEL数据和写入数据到EXCEL文件。
- csv模块
read
和writer
对象可以写读序列。 - csv模块
DictReader
和DictWriter
类可以读写字典形式的数据。 - csvwriter_object.writerows(rows)将rows对象的所有元素写入文件,相当于一次写入多行到文件。
- csvwriter_object.writerow(row)将row参数的元素写入文件,相当于写入一行到文件。
- csvwriter_object.writeheader()将构建方法中定义的字段名称写入到文件中作为CSV文件的表头。
- csv.reader(csvfile)读取csv文件数据。
- 使用reader()和write()的默认操作中,每一列使用逗号分开,每一行使用换行符分开。
- csv.DictReader(f, fieldnames=None, restkey=None, restval=None, dialect=’excel’, *args, **kwds)以字典作为元素时,可以指定
fieldnames
参数,表明字典中字段的名称,fieldnames
为sequence序列,restkey
参数表示当指定的字段数少于csv文件的列数时剩余的数据的列名,restval
参数表示当指定的字段数多于csv文件的列名数时,多出的字段自动插入的值。 - csv.DictWriter(f, fieldnames, restval=’‘, extrasaction=’raise’, dialect=’excel’, *args, **kwds)将字典列表写入到CSV文件中,
fieldnames
sequuence序列必须指定,restval
参数用于当指定的字段数多于字典列表的键总数时自动填充的值,extrasaction
参数用于指定当字典列表的键总数超过fieldnames
定义的字段总数时的行为,默认引发ValueError
异常,也可以指定为extrasaction='ignore'
表示忽略字典中的额外值。
csv模块的方法或属性:
In [1]: import csv
In [2]: csv.
Dialect excel list_dialects() QUOTE_NONNUMERIC Sniffer writer()
DictReader excel_tab QUOTE_ALL re StringIO
DictWriter field_size_limit() QUOTE_MINIMAL reader() unix_dialect
Error get_dialect() QUOTE_NONE register_dialect() unregister_dialect()
示例1,写入列表数据到csv文件中:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | In [1]: import csv
In [2]: CSV_DATA = [
...: ['id', 'username', 'age', 'country'],
...: ['1001', 'Stephen Curry', '30', 'USA'],
...: ['1002', 'Kobe Bryant', '40', 'USA'],
...: ['1003', 'Manu Ginóbili', '41', 'Argentina']
...: ]
In [3]: CSV_DATA
Out[3]:
[['id', 'username', 'age', 'country'],
['1001', 'Stephen Curry', '30', 'USA'],
['1002', 'Kobe Bryant', '40', 'USA'],
['1003', 'Manu Ginóbili', '41', 'Argentina']]
In [4]: with open('file.csv', 'wt') as fout:
...: csvwriter_object = csv.writer(fout)
...: csvwriter_object.writerows(CSV_DATA)
...:
In [5]: csvwriter_object
Out[5]: <_csv.writer at 0x7fd479b0b258>
|
查看文件file.csv数据:
[meizhaohui@localhost ~]$ cat file.csv
id,username,age,country
1001,Stephen Curry,30,USA
1002,Kobe Bryant,40,USA
1003,Manu Ginóbili,41,Argentina
示例2, 读取csv文件数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | In [6]: with open('file.csv', 'rt') as fin:
...: csvreader_object = csv.reader(fin)
...: data = [row for row in csvreader_object]
...:
In [7]: csvreader_object
Out[7]: <_csv.reader at 0x7fd479b013c8>
In [8]: data
Out[8]:
[['id', 'username', 'age', 'country'],
['1001', 'Stephen Curry', '30', 'USA'],
['1002', 'Kobe Bryant', '40', 'USA'],
['1003', 'Manu Ginóbili', '41', 'Argentina']]
|
示例3,将csv数据读取后保存为字典为元素的列表:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | In [9]: with open('file.csv', 'rt') as fin:
...: dictreader_object = csv.DictReader(fin)
...: data_dict_list = [row for row in dictreader_object]
...:
In [10]: dictreader_object
Out[10]: <csv.DictReader at 0x7fd479ac7208>
In [11]: data_dict_list
Out[11]:
[{'age': '30', 'country': 'USA', 'id': '1001', 'username': 'Stephen Curry'},
{'age': '40', 'country': 'USA', 'id': '1002', 'username': 'Kobe Bryant'},
{'age': '41',
'country': 'Argentina',
'id': '1003',
'username': 'Manu Ginóbili'}]
|
说明: 此例中,因为没有在csv.DictReader(fin)中指定 fieldnames
,csv模块会自动读取第一行作为字段名称。
示例4,指定 fieldnames
字段名称:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | In [12]: with open('file.csv', 'rt') as fin:
...: dictreader_object1 = csv.DictReader(fin, fieldnames=['first','second','third','fouth'])
...: data_dict_list1 = [row for row in dictreader_object1]
...:
In [13]: dictreader_object1
Out[13]: <csv.DictReader at 0x7fd479c1a358>
In [14]: data_dict_list1
Out[14]:
[{'first': 'id', 'fouth': 'country', 'second': 'username', 'third': 'age'},
{'first': '1001', 'fouth': 'USA', 'second': 'Stephen Curry', 'third': '30'},
{'first': '1002', 'fouth': 'USA', 'second': 'Kobe Bryant', 'third': '40'},
{'first': '1003',
'fouth': 'Argentina',
'second': 'Manu Ginóbili',
'third': '41'}]
|
说明:由于指定了 fieldnames
字段名称,csv文件中第一行就当做了普通的数据行,不作为表头数据。
示例5,指定 fieldnames
字段名称,但指定的字段数少于csv文件中的列数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | In [15]: with open('file.csv', 'rt') as fin:
...: dictreader_object2 = csv.DictReader(fin, fieldnames=['first','second'])
...: data_dict_list2 = [row for row in dictreader_object2]
...:
In [16]: dictreader_object2
Out[16]: <csv.DictReader at 0x7fd47834ea58>
In [17]: data_dict_list2
Out[17]:
[{None: ['age', 'country'], 'first': 'id', 'second': 'username'},
{None: ['30', 'USA'], 'first': '1001', 'second': 'Stephen Curry'},
{None: ['40', 'USA'], 'first': '1002', 'second': 'Kobe Bryant'},
{None: ['41', 'Argentina'], 'first': '1003', 'second': 'Manu Ginóbili'}]
|
说明:此种情况会将csv多出的数据保存在列表中,并使用 restkey
指定的字段名(默认为None)进行存储,如果非空行的字段数少于字段名,则公缺少的值填入None。由于我们并未指定 restkey
值,因此除了’first’和’second’字段名外,还有一个None字段名。
示例6,指定 fieldnames
字段名称,但指定的字段数少于csv文件中的列数,但指定 restkey
值:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | In [18]: with open('file.csv', 'rt') as fin:
...: dictreader_object3 = csv.DictReader(fin, fieldnames=['first','second'], restkey='other')
...: data_dict_list3 = [row for row in dictreader_object3]
...:
In [19]: dictreader_object3
Out[19]: <csv.DictReader at 0x7fd479acae10>
In [20]: data_dict_list3
Out[20]:
[{'first': 'id', 'other': ['age', 'country'], 'second': 'username'},
{'first': '1001', 'other': ['30', 'USA'], 'second': 'Stephen Curry'},
{'first': '1002', 'other': ['40', 'USA'], 'second': 'Kobe Bryant'},
{'first': '1003', 'other': ['41', 'Argentina'], 'second': 'Manu Ginóbili'}]
|
说明: 此时因为指定了 restkey
参数值为’other’,因此输出数据中以’first’,’second’,’other’作为字典的键。
示例7,指定 fieldnames
字段名称,但指定的字段数多于csv文件中的列数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | In [21]: with open('file.csv', 'rt') as fin:
...: dictreader_object4 = csv.DictReader(fin, fieldnames=['first','second','third','fouth','fifth'])
...: data_dict_list4 = [row for row in dictreader_object4]
...:
In [22]: data_dict_list4
Out[22]:
[{'fifth': None,
'first': 'id',
'fouth': 'country',
'second': 'username',
'third': 'age'},
{'fifth': None,
'first': '1001',
'fouth': 'USA',
'second': 'Stephen Curry',
'third': '30'},
{'fifth': None,
'first': '1002',
'fouth': 'USA',
'second': 'Kobe Bryant',
'third': '40'},
{'fifth': None,
'first': '1003',
'fouth': 'Argentina',
'second': 'Manu Ginóbili',
'third': '41'}]
|
说明:由于指定了5个字段名,而csv文件中只的4列,因此第5个字段’fifth’会被自动指定值为None。
示例8,指定 fieldnames
字段名称,但指定的字段数多于csv文件中的列数,并指定 restval
参数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | In [23]: with open('file.csv', 'rt') as fin:
...: dictreader_object5 = csv.DictReader(fin, fieldnames=['first','second','third','fouth','fifth'], restval='autoinsert')
...: data_dict_list5 = [row for row in dictreader_object5]
...:
In [24]: data_dict_list5
Out[24]:
[{'fifth': 'autoinsert',
'first': 'id',
'fouth': 'country',
'second': 'username',
'third': 'age'},
{'fifth': 'autoinsert',
'first': '1001',
'fouth': 'USA',
'second': 'Stephen Curry',
'third': '30'},
{'fifth': 'autoinsert',
'first': '1002',
'fouth': 'USA',
'second': 'Kobe Bryant',
'third': '40'},
{'fifth': 'autoinsert',
'first': '1003',
'fouth': 'Argentina',
'second': 'Manu Ginóbili',
'third': '41'}]
|
说明:由于指定了5个字段名,并且指定了 restval
参数为’autoinsert’,而csv文件中只的4列,因此第5个字段’fifth’会被自动指定值为’autoinsert’值。
示例9, 使用DictWriter()重写CSV文件:
1 2 3 4 5 6 7 8 9 10 11 12 13 | In [25]: data_dict_list
Out[25]:
[{'age': '30', 'country': 'USA', 'id': '1001', 'username': 'Stephen Curry'},
{'age': '40', 'country': 'USA', 'id': '1002', 'username': 'Kobe Bryant'},
{'age': '41',
'country': 'Argentina',
'id': '1003',
'username': 'Manu Ginóbili'}]
In [26]: with open('other.csv','wt') as fout:
...: dictwriter_object = csv.DictWriter(fout, fieldnames=('id','username','age','country'))
...: dictwriter_object.writerows(data_dict_list)
...:
|
查看other.csv文件的内容:
[meizhaohui@localhost ~]$ cat other.csv
1001,Stephen Curry,30,USA
1002,Kobe Bryant,40,USA
1003,Manu Ginóbili,41,Argentina
说明:发现此时只是将数据写入,但没有写入表头数据。
示例10, 使用DictWriter()重写CSV文件,并使用 dictwriter_object.writeheader()
写入表头数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | In [27]: data_dict_list
Out[27]:
[{'age': '30', 'country': 'USA', 'id': '1001', 'username': 'Stephen Curry'},
{'age': '40', 'country': 'USA', 'id': '1002', 'username': 'Kobe Bryant'},
{'age': '41',
'country': 'Argentina',
'id': '1003',
'username': 'Manu Ginóbili'}]
In [28]: with open('other.csv','wt') as fout:
...: dictwriter_object = csv.DictWriter(fout, fieldnames=('id','username','age','country'))
...: dictwriter_object.writeheader()
...: dictwriter_object.writerows(data_dict_list)
...:
|
查看other.csv文件的内容:
[meizhaohui@localhost ~]$ cat other.csv
id,username,age,country
1001,Stephen Curry,30,USA
1002,Kobe Bryant,40,USA
1003,Manu Ginóbili,41,Argentina
示例11, 使用DictWriter()重写CSV文件,并使用 dictwriter_object.writeheader()
写入表头数据,但 fieldnames
仅指定’id’和’username’两个字段,此时会引发异常:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | In [29]: data_dict_list
Out[29]:
[{'age': '30', 'country': 'USA', 'id': '1001', 'username': 'Stephen Curry'},
{'age': '40', 'country': 'USA', 'id': '1002', 'username': 'Kobe Bryant'},
{'age': '41',
'country': 'Argentina',
'id': '1003',
'username': 'Manu Ginóbili'}]
In [30]: with open('other.csv','wt') as fout:
...: dictwriter_object = csv.DictWriter(fout, fieldnames=('id','username'))
...: dictwriter_object.writeheader()
...: dictwriter_object.writerows(data_dict_list)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
ValueError: dict contains fields not in fieldnames: 'age', 'country'
|
说明:由于没有指定 extrasaction
参数,默认 extrasaction='raise'
,此时data_dict_list传递给dictwriter_object对象时,找不到’age’和’country’健对应的字段名称,因此会引发 ValueError
异常。下面示例指定 extrasaction
参数。
示例12, 使用DictWriter()重写CSV文件,并使用 dictwriter_object.writeheader()
写入表头数据,但 fieldnames
仅指定’id’和’username’两个字段,并指定 extrasaction='ignore'
参数:
1 2 3 4 5 6 7 8 | In [31]: with open('other.csv','wt') as fout:
...: dictwriter_object = csv.DictWriter(fout, fieldnames=('id','username'),extrasaction='ignore')
...: dictwriter_object.writeheader()
...: dictwriter_object.writerows(data_dict_list)
...:
In [32]: dictwriter_object
Out[32]: <csv.DictWriter at 0x7fd4798bd668>
|
查看other.csv文件的内容:
meizhaohui@localhost ~]$ cat other.csv
id,username
1001,Stephen Curry
1002,Kobe Bryant
1003,Manu Ginóbili
说明:通过指定 extrasaction='ignore'
参数,可以写入与字典列表长度不一致的字段数据到CSV文件中。
示例12, 使用DictWriter()重写CSV文件,并使用 dictwriter_object.writeheader()
写入表头数据,但 fieldnames
指定的字段数超过字典列表中的字段总数:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | In [33]: data_dict_list
Out[33]:
[{'age': '30', 'country': 'USA', 'id': '1001', 'username': 'Stephen Curry'},
{'age': '40', 'country': 'USA', 'id': '1002', 'username': 'Kobe Bryant'},
{'age': '41',
'country': 'Argentina',
'id': '1003',
'username': 'Manu Ginóbili'}]
In [34]: with open('other.csv','wt') as fout:
...: dictwriter_object = csv.DictWriter(fout, fieldnames=('id','username','age','country','number'))
...: dictwriter_object.writeheader()
...: dictwriter_object.writerows(data_dict_list)
...:
In [35]: dictwriter_object
Out[35]: <csv.DictWriter at 0x7fd479b064a8>
|
查看other.csv文件的内容:
[meizhaohui@localhost ~]$ cat other.csv
id,username,age,country,number
1001,Stephen Curry,30,USA,
1002,Kobe Bryant,40,USA,
1003,Manu Ginóbili,41,Argentina,
说明:此时多出了’number’字段,但’number’字段没有数据。
示例13, 使用DictWriter()重写CSV文件,并使用 dictwriter_object.writeheader()
写入表头数据,但 fieldnames
指定的字段数超过字典列表中的字段总数,并指定 restval
参数。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | In [36]: data_dict_list
Out[36]:
[{'age': '30', 'country': 'USA', 'id': '1001', 'username': 'Stephen Curry'},
{'age': '40', 'country': 'USA', 'id': '1002', 'username': 'Kobe Bryant'},
{'age': '41',
'country': 'Argentina',
'id': '1003',
'username': 'Manu Ginóbili'}]
In [37]: with open('other.csv','wt') as fout:
...: dictwriter_object = csv.DictWriter(fout, fieldnames=('id','username','age','country','number'), restval='autoinsert')
...: dictwriter_object.writeheader()
...: dictwriter_object.writerows(data_dict_list)
...:
In [38]: dictwriter_object
Out[38]: <csv.DictWriter at 0x7fd479ad9240>
|
查看other.csv文件的内容:
[meizhaohui@localhost ~]$ cat other.csv
id,username,age,country,number
1001,Stephen Curry,30,USA,autoinsert
1002,Kobe Bryant,40,USA,autoinsert
1003,Manu Ginóbili,41,Argentina,autoinsert
说明:此时多出了’number’字段,且’number’字段被填充了’autoinsert’数据。
csv格式化相当麻烦,看以下示例。
示例14, 设置CSV输出格式:
1 2 3 4 5 6 7 8 9 10 11 | In [39]: CSV_DATA
Out[39]:
[['id', 'username', 'age', 'country'],
['1001', 'Stephen Curry', '30', 'USA'],
['1002', 'Kobe Bryant', '40', 'USA'],
['1003', 'Manu Ginóbili', '41', 'Argentina']]
In [40]: with open('format.csv', 'wt') as fout:
...: writer_object = csv.writer(fout, delimiter=' ',quotechar='|',quoting=csv.QUOTE_MINIMAL)
...: writer_object.writerows(CSV_DATA)
...:
|
查看format.csv文件内容:
[meizhaohui@localhost ~]$ cat format.csv
id username age country
1001 |Stephen Curry| 30 USA
1002 |Kobe Bryant| 40 USA
1003 |Manu Ginóbili| 41 Argentina
示例15, 设置CSV输出格式:
1 2 3 4 | In [41]: with open('format.csv', 'wt') as fout:
...: writer_object = csv.writer(fout, delimiter=' ',quotechar='"',quoting=csv.QUOTE_MINIMAL)
...: writer_object.writerows(CSV_DATA)
...:
|
查看format.csv文件内容:
[meizhaohui@localhost ~]$ cat format.csv
id username age country
1001 "Stephen Curry" 30 USA
1002 "Kobe Bryant" 40 USA
1003 "Manu Ginóbili" 41 Argentina
为了便于指定输入和输出记录的格式,将特定格式参数组合成 dialect
,在创建 reader
和 writer
对象时,可以指定 dialect
参数,这些参数名称与下面的 Dialect
类定义的属性相同。
Dialect
类支持以下属性:
Dialect.delimiter
用于分隔字段的单字符字符串。默认为’,’。Dialect.lineterminator
用于指示writer
生成的行的结尾符,默认是’\r\n’。Dialect.quotechar
单字符,用于表示引用包含特殊字符的字段,例如字段中包含有delimiter
或quotechar
或 换行符,默认是双引号’”’。Dialect.quoting
控制何时使用引号,可以采用QUOTE_MINIMAL
或QUOTE_NONNUMERIC
或QUOTE_NONE
或QUOTE_ALL
,默认是QUOTE_MINIMAL
。QUOTE_MINIMAL
表示writer
对象仅引用包含特殊字符的字段,例如delimiter
,quotechar
或lineterminator
中的任何字符。QUOTE_NONNUMERIC
表示writer
对象仅引用引用所有非数字字段。QUOTE_NONE
表示writer
对象永远不引用字段,当输出数据中包含delimiter
分隔符字符时,使用Dialect.escapechar
转义,如果未指定Dialect.escapechar
,则在遇到需要转义的字符时,则会引起Error
异常。QUOTE_ALL
表示writer
对象仅引用所有的字段。
Dialect.skipinitialspace
如果是True
,则分隔符后面的whitespace被忽略,默认是False
。Dialect.escapechar
表示writer
对象碰到delimiter
时的转义字符,如果Dialect.quoting
设置为QUOTE_NONE
,如果doublequote
设置为False
,则为quotechar
。Dialect.doublequote
控制如何引用字段中出现的quotechar
实例。 如果为True
,则字符加倍。 如果为False
,则escapechar
将用作quotechar
的前缀。 默认为True
。
示例16,使用|作为分隔符,且使用双引号’“‘引用所有的字段:
1 2 3 4 | In [42]: with open('format.csv', 'wt') as fout:
...: writer_object = csv.writer(fout, delimiter='|',quotechar='"',quoting=csv.QUOTE_ALL)
...: writer_object.writerows(CSV_DATA)
...:
|
查看format.csv文件内容:
[meizhaohui@localhost ~]$ cat format.csv
"id"|"username"|"age"|"country"
"1001"|"Stephen Curry"|"30"|"USA"
"1002"|"Kobe Bryant"|"40"|"USA"
"1003"|"Manu Ginóbili"|"41"|"Argentina"
- 使用
writer_object.writerow(data)
写入单行数据到CSV文件。
示例17,使用|作为分隔符,且使用双引号’“‘引用非数字的字段:
1 2 3 4 5 6 7 8 9 | In [43]: first_line = ('a','b','c', 1, 2)
In [44]: second_line = [',','"','|','line2']
In [45]: with open('format.csv', 'wt') as fout:
...: writer_object = csv.writer(fout, delimiter='|',quotechar='"',quoting=csv.QUOTE_NONNUMERIC)
...: writer_object.writerow(first_line)
...: writer_object.writerow(second_line)
...:
|
查看format.csv文件内容:
[meizhaohui@localhost ~]$ cat format.csv
"a"|"b"|"c"|1|2
","|""""|"|"|"line2"
说明:第二行中因为有字段中的字符是双引号,与quotechar字符相同,因此根据Dialect.doublequote的定义,需要两个quotechar引用“。
其他的参数选项,可以参考上面介绍的 Dialect
进行自行测试。
XML文件的读写¶
- XML是一种标记(markup)格式,它使用标签(tag)分隔数据。
- XML通常用于数据传送和消息。
- XML包含的元素类型,标签<tag>。
- XML包含的元素类型,属性<tag name=”attribute”>。
- XML包含的元素类型,数据<tag>data</tag>。
- 在Python中解析XML最简单的方法是使用
xml.etree.ElementTree
模块。
xml.etree.ElementTree
解析XML¶
我们将使用以下XML文档(country_data.xml)作为本节的示例数据:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank>1</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank>4</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank>68</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
|
- ElementTree将整个XML文档表示为树,Element表示此树中的单个节点。
- 从XML文件中读取XML数据,用
ET.parse('file.xml')
解析xml文件,获取xml树,用tree.getroot()
获取根节点,根节点是一个Element
对象。
从文件中读取XML数据:
In [1]: import xml.etree.ElementTree as ET
In [2]: tree = ET.parse('country_data.xml')
In [3]: root = tree.getroot()
In [4]: tree
Out[4]: <xml.etree.ElementTree.ElementTree at 0x7f932cc24d30>
In [5]: root
Out[5]: <Element 'data' at 0x7f932e653818>
从字符串变量中读取XML数据:
In [6]: xml_string="""<?xml version="1.0"?>
...: <data>test</data>
...: """
In [7]: test_root = ET.fromstring(xml_string)
In [8]: test_root
Out[8]: <Element 'data' at 0x7f932eb034a8>
- 访问对象的标签
tag = element.tag
- 访问对象的属性
attrib = element.attrib
- 访问对象的值
value = element.text
访问根节点标签,属性和值:
In [9]: root.tag
Out[9]: 'data'
In [10]: root.attrib
Out[10]: {}
In [11]: root.text
Out[11]: '\n '
打印根节点的子节点的标签,属性:
In [12]: for child in root:
...: print(child.tag, child.attrib)
...:
country {'name': 'Liechtenstein'}
country {'name': 'Singapore'}
country {'name': 'Panama'}
当子节点是嵌套时,我们可以通过索引方式访问子节点:
In [13]: root[0]
Out[13]: <Element 'country' at 0x7f932e653868>
In [14]: root[0].tag
Out[14]: 'country'
In [15]: root[0].attrib
Out[15]: {'name': 'Liechtenstein'}
In [16]: root[0][1].tag
Out[16]: 'year'
In [17]: root[0][1].text
Out[17]: '2008'
- 查找节点元素,迭代子元素,
iter(tag=None)
显示tag标签及其下所有子标签。 - 查找节点元素,
findall(match)
查找直接子元素中匹配match的节点。 - 查找节点元素,
find(match)
查找直接子元素中第一个匹配match的节点。
迭代子元素:
In [18]: for neighbor in root.iter('neighbor'):
...: print(neighbor.attrib)
...:
{'direction': 'E', 'name': 'Austria'}
{'direction': 'W', 'name': 'Switzerland'}
{'direction': 'N', 'name': 'Malaysia'}
{'direction': 'W', 'name': 'Costa Rica'}
{'direction': 'E', 'name': 'Colombia'}
findall或find查找子元素:
In [19]: for country in root.findall('country'):
...: rank = country.find('rank').text
...: name = country.get('name')
...: print('name:{},rank:{}'.format(name, rank))
...:
name:Liechtenstein,rank:1
name:Singapore,rank:4
name:Panama,rank:68
In [20]: root.findall('country')
Out[20]:
[<Element 'country' at 0x7f932e653868>,
<Element 'country' at 0x7f932cc2bf48>,
<Element 'country' at 0x7f932cc2b818>]
In [21]: root.findall('rank')
Out[21]: []
In [22]: root.findall('neighbor')
Out[22]: []
In [23]: root[0].findall('neighbor')
Out[23]:
[<Element 'neighbor' at 0x7f932cc2bbd8>,
<Element 'neighbor' at 0x7f932cc2b9f8>]
In [24]: root[0].find('neighbor')
Out[24]: <Element 'neighbor' at 0x7f932cc2bbd8>
In [25]: root[0].find('neighbor').get('name')
Out[25]: 'Austria'
# 说明:使用find匹配只能配置到第一个'neighbor',不能匹配到名称为'Switzerland'的子节点
In [26]: root[0].findall('neighbor')[0].get('name')
Out[26]: 'Austria'
In [27]: root[0].findall('neighbor')[1].get('name')
Out[27]: 'Switzerland'
ElementTree.write()
将更新后的XML数据写入到文件。- 可以直接通过操作Element对象来修改节点元素的标签,属性等。
element.text = new_value
给节点赋新值。element.set('attribute_name', 'attribute_value')
设置节点属性。element.append(subelement)
给节点增加子节点。
修改节点:
In [39]: for rank in root.iter('rank'):
...: new_rank = int(rank.text) + 1
...: rank.text = str(new_rank)
...: rank.set('updated', 'yes')
...:
In [40]: tree.write('output.xml')
新的output.xml文件内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | <data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor direction="W" name="Costa Rica" />
<neighbor direction="E" name="Colombia" />
</country>
</data>
|
可以发现第3,10,16行的rank节点已经修改成功。但输出文件中并没有 <?xml version="1.0"?>
XML的版本声明。
tree.write('output.xml',encoding='utf-8',xml_declaration=True)
声明XML的版本为1.0,并指定用XML传递数据的时候的字符编码为utf-8。
增加XML的版本声明,并设置编码格式:
In [41]: tree.write('output.xml',encoding='utf-8',xml_declaration=True)
再查看output.xml文件的内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | <?xml version='1.0' encoding='utf-8'?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor direction="W" name="Costa Rica" />
<neighbor direction="E" name="Colombia" />
</country>
</data>
|
- 使用
Element.remove(subelement)
移除子节点。
删除rank大于50的所有国家的数据:
In [42]: for country in root.findall('country'):
...: rank = int(country.find('rank').text)
...: print('rank:{}'.format(rank))
...: if rank > 50:
...: root.remove(country)
...:
rank:2
rank:5
rank:69
In [43]: tree.write('output.xml',encoding='utf-8',xml_declaration=True)
再查看output.xml文件的内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <?xml version='1.0' encoding='utf-8'?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
</data>
|
说明:虽然数据正常的写入到文件中,但最后的</data>标签缩进不正常,并没有与前面的<data>标签对齐。
- 使用
ET.SubElement((parent, tag, attrib={}, \*\*extra)
创建子节点Element对象。 - 使用
ET.dump(element)
将一个Element对象打印到标准输出。这个函数只用来调试(一般不把结果打印到标准输出)。
新增country子节点:
In [44]: new_country = ET.SubElement(root, 'country', attrib={'name': 'Panama'}, other='other
...: _attribute')
In [45]: new_country
Out[45]: <Element 'country' at 0x7fecb2e51908>
In [46]: ET.dump(root)
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama" other="other_attribute" /></data>
element.append(subelement)
给节点增加子节点。
给刚才新增的country节点增加rank子节点,并指定rank节点的’updated’属性:
In [47]: country_rank = ET.Element('rank', attrib={'updated': 'yes'})
In [48]: new_country.append(country_rank)
In [49]: ET.dump(root)
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama" other="other_attribute"><rank updated="yes" /></country></data>
In [50]: tree.write('output.xml',encoding='utf-8',xml_declaration=True)
再查看output.xml文件的内容:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | <?xml version='1.0' encoding='utf-8'?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama" other="other_attribute"><rank updated="yes" /></country></data>
|
- 解析带名称空间(namespace)的XML文件。
- 名称空间是为了解决名称冲突而诞生的,将一个很长的可以保证全局唯一性的字符串与tag标签关联起来,就可以避免命名冲突。可以使用
统一资源标识符(Uniform Resource Identifier, URI)
来标识名称空间。最普通的URL是统一资源定位符(Uniform Resource Locator, URL)
,URL用于标识网络主机的地址。 - 用来标识名称空间的网络地址URL并不被XML解析器调用,XML解析器不需要从这个URL中查找信息,该URL的作用仅仅是给名称空间一个唯一的名字,因此这个网络地址可以是虚构的。很多公司经常把这个网络地址指向一个真实的WEB页面,这个地址包含了关于当前名称空间更详细的信息。
- 定义一个默认的XML名称空间使得我们在子元素的开始不需要使用前缀,定义格式:
<element xmlns="default_namespace_URI"
。 - 非默认的名称空间时,需要指定名称前缀namespace-prefix,带有前缀形式的标签和属性
prefix:sometag
将扩展为{uri}sometag
,前缀由完整的URI替代。定义格式:<element xmlns:namespace-prefix="namespace_URL"
。
- 名称空间是为了解决名称冲突而诞生的,将一个很长的可以保证全局唯一性的字符串与tag标签关联起来,就可以避免命名冲突。可以使用
下面的存储有演员及其扮演的角色信息的XML文件(actors.xml)包含两种名称空间,一种是默认的名称空间,另一种是前缀为”fictional”的名称空间:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <?xml version="1.0"?>
<actors xmlns:fictional="http://characters.example.com"
xmlns="http://people.example.com">
<actor>
<name>John Cleese</name>
<fictional:character>Lancelot</fictional:character>
<fictional:character>Archie Leach</fictional:character>
</actor>
<actor>
<name>Eric Idle</name>
<fictional:character>Sir Robin</fictional:character>
<fictional:character>Gunther</fictional:character>
<fictional:character>Commander Clement</fictional:character>
</actor>
</actors>
|
解析actors.xml文件,并尝试使用findall方法获取actor节点数据:
In [50]: import xml.etree.ElementTree as ET
In [51]: tree = ET.parse('actors.xml')
In [52]: actors_root = tree.getroot()
In [53]: actors_root
Out[53]: <Element '{http://people.example.com}actors' at 0x7fafe2880138>
In [54]: actors_root.findall('actor')
Out[54]: []
In [55]: ET.dump(actors_root)
<ns0:actors xmlns:ns0="http://people.example.com" xmlns:ns1="http://characters.example.com">
<ns0:actor>
<ns0:name>John Cleese</ns0:name>
<ns1:character>Lancelot</ns1:character>
<ns1:character>Archie Leach</ns1:character>
</ns0:actor>
<ns0:actor>
<ns0:name>Eric Idle</ns0:name>
<ns1:character>Sir Robin</ns1:character>
<ns1:character>Gunther</ns1:character>
<ns1:character>Commander Clement</ns1:character>
</ns0:actor>
</ns0:actors>
In [56]: actors_root.tag
Out[56]: '{http://people.example.com}actors'
说明:直接使用findall并没有获取到actor节点数据。在各标签前已经自动加上了前缀
第一种方式是在使用findall()或find()时手动加上{URI}到每一个标签或属性的xpath上面:
In [57]: default_prefix = '{http://people.example.com}'
In [58]: char_prefix = '{http://characters.example.com}'
In [59]: for actor in actors_root.findall('{}actor'.format(default_prefix)):
...: name = actor.find('{}name'.format(default_prefix))
...: print(name.text)
...: for char in actor.findall('{}character'.format(char_prefix)):
...: print(' |-->', char.text)
...:
John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement
另一种方式是为搜索名称空间前缀创建一个字典,并在搜索功能中使用字典:
In [60]: ns = {'real_person': 'http://people.example.com','role': 'http://characters.example.com'}
In [61]: ns
Out[61]:
{'real_person': 'http://people.example.com',
'role': 'http://characters.example.com'}
In [62]: for actor in actors_root.findall('real_person:actor', namespaces=ns):
...: name = actor.find('real_person:name', ns)
...: print(name.text)
...: for char in actor.findall('role:character', ns):
...: print(' |-->', char.text)
...:
John Cleese
|--> Lancelot
|--> Archie Leach
Eric Idle
|--> Sir Robin
|--> Gunther
|--> Commander Clement
- XPath支持,
xml.etree.ElementTree
模块对XPath表达式支持比较有限,便于在树中定位元素,完整的XPath引擎超出了模块的范围。
XPath语法如下:
语法 | 解释 |
---|---|
tag | 选中符合给定tag标签的全部Element元素 |
* | 星号,选中全部子Element元素 |
. | 点号,选中当前Element元素 |
// | 选中同一级别的全部子Element元素 |
.. | 双点号,选中父节点Element元素 |
[@attrib] | 选中所有具有attrib属性的节点Element元素 |
[@attrib=’value’] | 选中所有具有attrib属性具值为value的节点Element元素 |
XPath的使用示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | In [63]: ET.dump(root)
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama" other="other_attribute"><rank updated="yes" /></country></data>
In [64]: root.findall(".") # XPath中使用.点号搜索
Out[64]: [<Element 'data' at 0x7fecb1819188>]
In [65]: root.findall(".")[0].tag
Out[65]: 'data'
In [66]: root.findall("./country/neighbor") # XPath使用点号和tag方式搜索
Out[66]:
[<Element 'neighbor' at 0x7fecb1819278>,
<Element 'neighbor' at 0x7fecb1819048>,
<Element 'neighbor' at 0x7fecb1819408>]
In [67]: for neighbor in root.findall("./country/neighbor"):
...: print(neighbor.get('name'))
...:
Austria
Switzerland
Malaysia
In [68]: root.findall("./*") # XPath使用点号和星号搜索所有root的子节点
Out[68]:
[<Element 'country' at 0x7fecb1819368>,
<Element 'country' at 0x7fecb18190e8>,
<Element 'country' at 0x7fecb2e51908>]
In [69]: root.findall("*/year") # XPath使用星号搜索所有year节点
Out[69]: [<Element 'year' at 0x7fecb1819458>, <Element 'year' at 0x7fecb18193b8>]
In [70]: for year in root.findall("*/year"):
...: print(year.text)
...:
2008
2011
In [71]: root.findall(".//rank") # 使用XPath点号和//语法,选中所有rank节点
Out[71]:
[<Element 'rank' at 0x7fecb1819138>,
<Element 'rank' at 0x7fecb1819228>,
<Element 'rank' at 0x7fecb185e6d8>]
In [72]: for rank in root.findall(".//rank"):
...: print(rank.text)
...:
2
5
None
In [73]: root.findall("./country/rank/..") # 使用XPath点号,tag标签以及双点号查找父节点
Out[73]:
[<Element 'country' at 0x7fecb1819368>,
<Element 'country' at 0x7fecb18190e8>,
<Element 'country' at 0x7fecb2e51908>]
In [74]: root.findall(".//country[@name]") # 使用XPath点号,查找具有name属性的country节点
Out[74]:
[<Element 'country' at 0x0000026C6A325CC8>,
<Element 'country' at 0x0000026C6A325B88>,
<Element 'country' at 0x0000026C6A3254F8>]
In [75]: root.findall(".//country[@other]") # 使用XPath点号,查找具有other属性的country节点
Out[75]: [<Element 'country' at 0x0000026C6A3254F8>]
In [76]: root.findall(".//country/rank[@updated]") # 使用XPath点号,查找具有updated属性的country/rank节点
Out[76]:
[<Element 'rank' at 0x0000026C6A325B38>,
<Element 'rank' at 0x0000026C6A3254A8>,
<Element 'rank' at 0x0000026C6A3251D8>]
In [77]: root.findall(".//country[@name='Singapore']") # 使用XPath点号,查找具有name属性且值为'Singapore'的country节点
Out[77]: [<Element 'country' at 0x0000026C6A325B88>]
In [78]: root.findall(".//country[@other='other_attribute']") # 使用XPath点号,查找具有other属性且值为'other_attribute'的country节点
Out[78]: [<Element 'country' at 0x0000026C6A3254F8>]
|
更多 xml.etree.ElementTree
的介绍,请参考 xml.etree.ElementTree — The ElementTree XML API
xml.sax
解析XML可参考 xml.sax — Support for SAX2 parsers
xml.dom
解析XML可参考 xml.dom — The Document Object Model API
- XML安全问题:
defusedxml
修复了Python的XML库中的拒绝服务和其他漏洞,只需要用defusedxml
替换原来用的xml.etree
。
不安全:
import xml.etree.ElementTree as ET
受保护:
import defusedxml.ElementTree as ET
美化xml输出¶
pretty_xml.py文件内容如下:
#!/usr/bin/python3
"""
@Time : 2019/4/8 20:29
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@File : pretty_xml.py
@Software: PyCharm
"""
import xml.etree.ElementTree as ET
def prettyxml(element, indent=' ', newline='\n', level=0):
"""
美化XML Element对象
:param element: Element对象,写入文件时,推荐使用root
:param indent: 缩进空格,默认4个空格
:param newline: 换行符
:param level: 缩进层次
:return:
"""
# elemnt为传进来的Elment类,参数indent用于缩进,newline用于换行
if element: # 判断element是否有子元素
if not element.text or element.text.isspace(): # 如果element的text没有内容
element.text = newline + indent * (level + 1)
else:
element.text = newline + indent * (level + 1) \
+ element.text.strip() + newline + indent * (level + 1)
temp = list(element) # 将elemnt转成list
for subelement in temp:
# 如果不是list的最后一个元素,说明下一个行是同级别元素的起始,缩进应一致
if temp.index(subelement) < (len(temp) - 1):
subelement.tail = newline + indent * (level + 1)
else:
subelement.tail = newline + indent * level
prettyxml(subelement, indent, newline, level=level + 1) # 对子元素进行递归操作
def main():
"""main function"""
tree = ET.parse('data.xml')
root = tree.getroot()
prettyxml(root)
tree.write('output.xml',
encoding='utf-8',
xml_declaration=True,
method='xml',
short_empty_elements=False)
if __name__ == '__main__':
main()
data.xml文件内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | <?xml version="1.0"?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria" />
<neighbor direction="W" name="Switzerland" />
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc><neighbor direction="N" name="Malaysia" />
</country>
<country name="Panama" other="other_attribute"><rank updated="yes" /></country></data>
|
运行pretty_xml.py生成的output.xml文件内容如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | <?xml version='1.0' encoding='utf-8'?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor direction="E" name="Austria"></neighbor>
<neighbor direction="W" name="Switzerland"></neighbor>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor direction="N" name="Malaysia"></neighbor>
</country>
<country name="Panama" other="other_attribute">
<rank updated="yes"></rank>
</country>
</data>
|
参考:
常用内建模块Collections模块的使用¶
目录
Collections模块介绍¶
前面介绍了python内建数据结构包括 列表(list) 、元组(tuple) 和 字典(dict) 。 collections模块在这些内置数据类型的基础上,提供了几个额外的数据类型:
- namedtuple: 生成可以使用名字来访问元素内容的tuple子类
- deque: 双端队列,可以快速的从另外一侧追加和推出对象
- Counter: 计数器,主要用来计数
- OrderedDict: 有序字典
- defaultdict: 带有默认值的字典
下面主要介绍双端队列deque、命名元组namedtuple、有序字典OrderedDict。
常用内建模块之双端队列deque¶
- collections模块中双端队列deque结构可以看作是内置list结构的加强版,且比队列提供更强大的方法。
- deque是double-ended queue的缩写,提供在两端插入和删除的操作。
- deque([iterable[, maxlen]]) –> deque object,maxlen为双端队列的最大长度
双端队列的使用方法如下:
>>> from collections import deque
>>> deque=deque((),5)
>>> deque.
deque.append( deque.copy( deque.extendleft( deque.maxlen deque.remove(
deque.appendleft( deque.count( deque.index( deque.pop( deque.reverse(
deque.clear( deque.extend( deque.insert( deque.popleft( deque.rotate(
>>> deque
deque([], maxlen=5)
deque.append(item) # 在队列右边(末尾)添加项目[Add an element to the right side of the deque.]
deque.appendleft(item) # 在队列左边(开始)添加项目[Add an element to the left side of the deque.]
deque.clear() # 清空队列,也就是删除deque中的所有项目[Remove all elements from the deque.]
deque.extend(iterator) # 在deque的右边(末尾)添加iterator中的所有项目[Extend the right side of the deque with elements from the iterable]
deque.extendleft(iterator) # 在deque的左边(开始)添加iterator中的所有项目[Extend the left side of the deque with elements from the iterable]
deque.copy() # 返回deque队列的一个浅拷贝[Return a shallow copy of a deque.]
deque.count(item) # 返回deque队列中元素item出现的次数[return number of occurrences of value]
deque.index(value, [start, [stop]]) # 返回value在deque队列中的索引index[integer -- return first index of value.]
deque.index(index, object) # 在deque队列索引号Index前插入对象object[insert object before index]
deque.pop() # 移除并返回队列右边(末尾)的元素[Remove and return the rightmost element.]
deque.popleft() # 移除并返回队列左边(开始)的元素[Remove and return the leftmost element.]
deque.remove(value) # 移除队列中指定的元素[remove first occurrence of value.]
deque.reverse() # 翻转队列,即队列前后翻转
deque.rotate(step) # 向右旋转step步,不设置步数是,则默认向右旋转1步,如果step小于0,则向左旋转。
deque.maxlen # 队列的最大长度
>>> deque
deque([], maxlen=5)
>>> deque.maxlen
5
>>> deque.append('first')
>>> deque
deque(['first'], maxlen=5)
>>> deque.append('second')
>>> deque
deque(['first', 'second'], maxlen=5)
>>> deque.append('third')
>>> deque
deque(['first', 'second', 'third'], maxlen=5)
>>> deque.appendleft('four')
>>> deque
deque(['four', 'first', 'second', 'third'], maxlen=5)
>>> deque.extend(['four','five'])
>>> deque
deque(['first', 'second', 'third', 'four', 'five'], maxlen=5)
>>> deque.extendleft(['four','five'])
>>> deque
deque(['five', 'four', 'first', 'second', 'third'], maxlen=5)
>>> deque1=deque.copy()
>>> type(deque1)
<class 'collections.deque'>
>>> deque1
deque(['five', 'four', 'first', 'second', 'third'], maxlen=5)
>>> deque.extend(('fourth','fifth'))
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.count('first')
1
>>> deque.count('second')
1
>>> deque.count('third')
1
>>> deque.index('first')
0
>>> deque.index('second')
1
>>> deque.index('third')
2
>>> deque.index('third',0,2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'third' is not in deque
>>> deque.index('third',0,3)
2
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.reverse()
>>> deque
deque(['fifth', 'fourth', 'third', 'second', 'first'], maxlen=5)
>>> deque.reverse()
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.rotate()
>>> deque
deque(['fifth', 'first', 'second', 'third', 'fourth'], maxlen=5)
>>> deque.rotate(-1)
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.rotate(3)
>>> deque
deque(['third', 'fourth', 'fifth', 'first', 'second'], maxlen=5)
>>> deque.rotate(-3)
>>> deque
deque(['first', 'second', 'third', 'fourth', 'fifth'], maxlen=5)
>>> deque.pop()
'fifth'
>>> deque
deque(['first', 'second', 'third', 'fourth'], maxlen=5)
>>> deque.popleft()
'first'
>>> deque
deque(['second', 'third', 'fourth'], maxlen=5)
>>> deque.remove('fourth')
>>> deque
deque(['second', 'third'], maxlen=5)
>>> len(deque)
2
>>> deque.maxlen
5
>>> deque.remove('third')
>>> deque
deque(['second'], maxlen=5)
>>> len(deque)
1
>>> deque.maxlen
5
>>> deque.clear()
>>> deque
deque([], maxlen=5)
常用内建模块之计数器Counter¶
- Counter类的目的是用来跟踪值出现的次数。它是一个无序的容器类型,以字典的键值对形式存储,其中元素作为key,其计数作为value。
- Counter() 创建一个空的Counter()类对象。
- Counnter(iterable):从一个可iterable对象(list、tuple、dict、字符串等)创建Counter对象。
- 当所访问的键不存在时,返回0,而不是KeyError;否则返回它的计数。
- 函数most_common([num])以降序返回所有元素,如果指定num值,则返回该数字个数值对。
- 函数elements()返回一个迭代器。元素被重复了多少次,在该迭代器中就包含多少个该元素。元素排列无确定顺序。
示例:
In [1]: list1 = ['a', 'b', 'c', 'd', 'a', 'b', 'a', 'c']
In [2]: list1
Out[2]: ['a', 'b', 'c', 'd', 'a', 'b', 'a', 'c']
In [3]: from collections import Counter as ct
In [4]: ct(list1)
Out[4]: Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1})
In [5]: a = ct(list1)
In [6]: a
Out[6]: Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1})
In [7]: a.most_common()
Out[7]: [('a', 3), ('b', 2), ('c', 2), ('d', 1)]
In [8]: a.most_common(2)
Out[8]: [('a', 3), ('b', 2)]
In [9]: a.most_common(1)
Out[9]: [('a', 3)]
In [10]: a.values()
Out[10]: dict_values([3, 2, 2, 1])
In [11]: a.items()
Out[11]: dict_items([('a', 3), ('b', 2), ('c', 2), ('d', 1)])
In [12]: a.elements()
Out[12]: <itertools.chain at 0x19918ddfeb8>
In [13]: a.elements
Out[13]: <bound method Counter.elements of Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1})>
In [14]: a['a']
Out[14]: 3
In [15]: a['b']
Out[15]: 2
In [16]: a['e']
Out[16]: 0
In [17]: list(a.elements())
Out[17]: ['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd']
In [18]: ct.
clear() fromkeys() keys() pop() subtract()
copy() get() most_common() popitem() update()
elements() items() mro() setdefault() values()
常用内建模块之命名元组namedtuple¶
访问元组数据时是通过索引下标来获取相应元素的值,需要熟记每个下标对应的具体含义。
当元组元素量较大时,记住每一个下标对应的意义那是相当困难的。于是就出现了命名元组namedtuple。
命名元组的对象的定义如下:
collections.namedtuple(typename, field_names, *, verbose=False, rename=False, module=None)
from collections import namedtuple 导入命名元组namedtuple
typename:此元组的名称
field_names:字段名称,可以是whitespace或逗号分隔开的字符串或列表,如'x y z'或'x,y,z'或['x','y','z']
保留字不要作为字段名称,数字和下划线不能作为字段开头字符。
verbose=False:如果verbose为true,则在构建完成后打印类定义。
这个选项已经过时了, 相反,打印_source属性更简单。
rename=False:是否重命名字段名称,如果rename=True,则当字段名称无效时,会被自动替换成下划线 加元素所在索引数,如_1等
命名元组namedtuple的使用方法如下:
# 定义,导入namedtuple包
>>> from collections import namedtuple
# 下面5种方式都是定义的名称为student的命名元组,并且有三个字段名称name/年龄age/性别sex
>>> student=namedtuple('student','name age sex')
>>> student=namedtuple('student','name,age,sex')
>>> student=namedtuple('student','name\tage\tsex')
>>> student=namedtuple('student',['name','age','sex'])
>>> student=namedtuple('student',(['name','age','sex']))
>>> sa=student('Manu',40,'male')
>>> sb=student(name='Danny Green',age=30,sex='male')
>>> sc=student('Tony Parker',36,sex='male')
>>> sa
student(name='Manu', age=40, sex='male')
>>> sb
student(name='Danny Green', age=30, sex='male')
>>> sc
student(name='Tony Parker', age=36, sex='male')
>>> sa.name
'Manu'
>>> sa.age
40
>>> sa.sex
'male'
# 定义球员的名称、国家,球衣号码组成的命名元组player
>>> player=namedtuple('player','name country number')
>>> player
<class '__main__.player'>
>>> manu=player('Manu Ginóbili','阿根廷',20)
>>> manu.name
'Manu Ginóbili'
>>> manu.cou
manu.count( manu.country
>>> manu.country
'阿根廷'
>>> manu.number
20
>>> Parker=player('Tony Parker','法国',9)
>>> Parker
player(name='Tony Parker', country='法国', number=9)
>>> Parker.name
'Tony Parker'
>>> Parker.count
Parker.count( Parker.country
>>> Parker.country
'法国'
>>> Parker.number
9
>>> type(Parker)
<class '__main__.player'>
# rename的使用
# 默认情况下rename=False,即当字段名称无效时,不重命名字段名称
# 不带rename属性时,带def和return等保留字时,定义会报错:
>>> with_def_return=namedtuple('player','name def country return number')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\ProgramFiles\Python3.6.2\lib\collections\__init__.py", line 406, in namedtuple
'keyword: %r' % name)
ValueError: Type names and field names cannot be a keyword: 'def'
>>> with_two_name=namedtuple('player','name country name number')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\ProgramFiles\Python3.6.2\lib\collections\__init__.py", line 413, in namedtuple
raise ValueError('Encountered duplicate field name: %r' % name)
ValueError: Encountered duplicate field name: 'name'
# 带rename属性时,带def和return等保留字时,定义不会报错,但保留字会被替换成下划线加元素所在索引数:
>>> with_def_return=namedtuple('player','name def country return number',rename=True)
>>> with_def_return
<class '__main__.player'>
>>> with_def_return._fields
('name', '_1', 'country', '_3', 'number')
>>> with_two_name=namedtuple('player','name country name number',rename=True)
>>> with_two_name
<class '__main__.player'>
>>> with_two_name._fields
('name', 'country', '_2', 'number')
# namedtuple命名元组的一些方法
somenamedtuple._fields 列出字段名称的字符串元组。
somenamedtuple._make(iterable) 从现有序列或迭代中创建新实例的类方法。
somenamedtuple._asdict() 返回一个新的有序字典OrderedDict,它将字段名称映射到相应的值
somenamedtuple._replace(**kwargs) 用新值替换命名元组的字段的值,并返回新命名元组
somenamedtuple._source python源码的字符串
# 使用_make将列表转换成命名元组实例
>>> list1=['Kawhi Leonard','美国',2]
>>> kawhi=player._make(list1)
>>> kawhi
player(name='Kawhi Leonard', country='美国', number=2)
>>> kawhi.name
'Kawhi Leonard'
>>> kawhi.country
'美国'
>>> kawhi.number
2
>>> kawhi._fields
('name', 'country', 'number')
>>> kawhi._asdict()
OrderedDict([('name', 'Kawhi Leonard'), ('country', '美国'), ('number', 2)])
# 使用_make将元组转换成命名元组实例
>>> tuple1=('Danny Green','美国',14)
>>> green=player._make(tuple1)
>>> green
player(name='Danny Green', country='美国', number=14)
>>> green.name
'Danny Green'
>>> green.country
'美国'
>>> green.number
14
>>> green._fields
('name', 'country', 'number')
>>> green._asdict()
OrderedDict([('name', 'Danny Green'), ('country', '美国'), ('number', 14)])
# 不能使用_make将字典转换成命名元组实例,需要使用double-star-operator双*操作:
>>> p1={'name':'Tim Duncan','country':'USA','number':11}
>>> tim=player._make(p1)
>>> tim # 转换出来的结果并不是自己想要的
player(name='name', country='country', number='number')
>>> tim=player(**p1)
>>> tim
player(name='Tim Duncan', country='USA', number=11)
# 使用_replace替换命名元组的字段的值,并返回新命名元组
>>> green
player(name='Danny Green', country='美国', number=14)
>>> green._replace(number=4)
player(name='Danny Green', country='美国', number=4)
>>> green.number
14
>>> new_green=green._replace(number=4)
>>> new_green
player(name='Danny Green', country='美国', number=4)
>>> new_green.number
4
# 使用_fields构建新的命名元组
>>> location=namedtuple('location','row column')
>>> location
<class '__main__.location'>
>>> location._fields
('row', 'column')
>>> color=namedtuple('color','red green blue')
>>> color._fields
('red', 'green', 'blue')
>>> pixel=namedtuple('pixel',location._fields+color._fields)
>>> pixel._fields
('row', 'column', 'red', 'green', 'blue')
常用内建模块之有序字典OrderedDict¶
python自带的字典dict是无序的,因为字典dict是按hash来存储的。
collections模块下的OrderedDict实现了对字典中元素的排序;由于有序字典会记住它的插入顺序,所以它可以与排序结合使用来创建一个已排序的字典。
有序字典OrderedDict的使用方法如下:
>>> from collections import OrderedDict as od
>>> od.
od.clear( od.fromkeys( od.items( od.move_to_end( od.pop( od.setdefault( od.values(
od.copy( od.get( od.keys( od.popitem( od.update(
od.fromkeys(iterator) # 从可迭代序列中生成有序键
od.items() # 返回有序字典的所有元素
od.get(key) # 获取键key对应的value值
od.values() # 返回有序字典的所有的value值
od.keys() # 返回有序字典的所有的key值
od.pop(key) # 从有序字典中移除键key,并返回key对应的值value
od.popitem(key,last=True) # 从有序字典中移除键key,返回元组(key,value)
# 不指定key时,则移除最后加入的key
# 如果指定last=True(默认),则LIFO(last-in,first-out后进先出)
# 如果指定last=False,则FIFO(first-in,first-out先进先出)
od.copy() # 复制有序字典
od.setdefault(key,value) # 获取有序字典中key对应的值
# 如果key不存在,则创建对应的key,并赋值为value
# 如果key不存在,则未指定value,则value值为None
od.update(key_value) # 更新有序字典中key对应的值为新value
od.clear() # 清空有序字典
od.move_to_end(key,last=True) # 将有序字典中key对应的键值对移动到有序字典有结尾处
# 如果指定last=False(默认为True),则移动到开始处
# 普通字典
>>> dict1 = {'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
>>> dict1
{'banana': 3, 'apple': 4, 'pear': 1, 'orange': 2}
# 按键排序
>>> dict2=od(sorted(dict1.items(),key=lambda t:t[0]))
>>> dict2
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
# 按值升序排序
>>> dict3=od(sorted(dict1.items(),key=lambda t:t[1]))
>>> dict3
OrderedDict([('pear', 1), ('orange', 2), ('banana', 3), ('apple', 4)])
# 按值降序排序
>>> dict3=od(sorted(dict1.items(),key=lambda t:t[1],reverse=True))
>>> dict3
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
# 按键对应的字符串的长度升序排序
>>> dict4=od(sorted(dict1.items(),key=lambda t:len(t[0])))
>>> dict4
OrderedDict([('pear', 1), ('apple', 4), ('banana', 3), ('orange', 2)])
# 按键对应的字符串的长度降序排序
>>> dict5=od(sorted(dict1.items(),key=lambda t:len(t[0]),reverse=True))
>>> dict5
OrderedDict([('banana', 3), ('orange', 2), ('apple', 4), ('pear', 1)])
>>> od1 = od([('name','meichaohui'),('lang','python')])
>>> od1
OrderedDict([('name', 'meichaohui'), ('lang', 'python')])
>>> od1['age']=28
>>> od1
OrderedDict([('name', 'meichaohui'), ('lang', 'python'), ('age', 28)])
>>> od2=od.fromkeys('abcdefg')
>>> od2
OrderedDict([('a', None), ('b', None), ('c', None), ('d', None), ('e', None), ('f', None), ('g', None)])
>>> od3=od.fromkeys(['a','b','c','d'])
>>> od3
OrderedDict([('a', None), ('b', None), ('c', None), ('d', None)])
>>> od4=od.fromkeys({"a":1})
>>> od4
OrderedDict([('a', None)])
>>> od3.items()
odict_items([('a', None), ('b', None), ('c', None), ('d', None)])
>>> od4.items()
odict_items([('a', None)])
>>> od1
OrderedDict([('name', 'meichaohui'), ('lang', 'python'), ('age', 28)])
>>> od1.get('name')
'meichaohui'
>>> od1.get('age')
28
>>> od1.get('lang')
'python'
>>> od1.values()
odict_values(['meichaohui', 'python', 28])
>>> od2.values()
odict_values([None, None, None, None, None, None, None])
>>> od2.keys()
odict_keys(['a', 'b', 'c', 'd', 'e', 'f', 'g'])
>>> od1.keys()
odict_keys(['name', 'lang', 'age'])
>>> dict1=od([('a',1),('b',2),('c',3)])
>>> dict1
OrderedDict([('a', 1), ('b', 2), ('c', 3)])
>>> dict1.pop()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Required argument 'key' (pos 1) not found
>>> dict1.pop('b')
2
>>> dict1
OrderedDict([('a', 1), ('c', 3)])
>>> dict1.popitem()
('c', 3)
>>> dict1
OrderedDict([('a', 1)])
>>> dict1.setdefault('b',2)
2
>>> dict1
OrderedDict([('a', 1), ('b', 2)])
>>> dict1.popitem('b')
('b', 2)
>>> dict1
OrderedDict([('a', 1)])
>>> dict1.setdefault('b')
>>> dict1
OrderedDict([('a', 1), ('b', None)])
>>> dict1.update('b')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
>>> dict1.update('b',1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: update() takes at most 1 positional argument (2 given)
>>> dict1.update(('b',1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
>>> dict1.update([('b',1)])
>>> dict1
OrderedDict([('a', 1), ('b', 1)])
>>> dict1.update([('b',2)])
>>> dict1
OrderedDict([('a', 1), ('b', 2)])
>>> dict1.update({'b':3})
>>> dict1
OrderedDict([('a', 1), ('b', 3)])
>>> dict2=dict1.copy()
>>> dict2
OrderedDict([('a', 1), ('b', 3)])
>>> dict2.clear()
>>> dict2
OrderedDict()
>>> dict1
OrderedDict([('a', 1), ('b', 3)])
>>> dict1['c']=2
>>> dict1
OrderedDict([('a', 1), ('b', 3), ('c', 2)])
>>> dict1['d']=4
>>> dict1
OrderedDict([('a', 1), ('b', 3), ('c', 2), ('d', 4)])
>>> dict1.move_to_end('b')
>>> dict1
OrderedDict([('a', 1), ('c', 2), ('d', 4), ('b', 3)])
>>> dict1.move_to_end('d')
>>> dict1
OrderedDict([('a', 1), ('c', 2), ('b', 3), ('d', 4)])
常用内建模块之defaultdict字典缺省默认值¶
在Python中如果访问字典中不存在的键,则会引发KeyError异常。
示例:
In [1]: dict1={'a':1,'b':2}
In [2]: dict1['a']
Out[2]: 1
In [3]: dict1['b']
Out[3]: 2
In [4]: dict1['c']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-6bf0c4d0a790> in <module>
----> 1 dict1['c']
KeyError: 'c'
访问dict1[‘c’]时提示’c’键不存在。
假设我有下面这样的一段文章需要统计每个单词的数量:
This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.
* namedtuple factory function for creating tuple subclasses with named fields
* deque list-like container with fast appends and pops on either end
* ChainMap dict-like class for creating a single view of multiple mappings
* Counter dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
* UserDict wrapper around dictionary objects for easier dict subclassing
* UserList wrapper around list objects for easier list subclassing
* UserString wrapper around string objects for easier string subclassing
- 不使用defaultdict,按普通的字典统计方式进行统计,在单词第一次统计的时候,在counts中相应的键存下默认值1。这需要在处理的时候添加一个判断语句。
代码如下:
# Filename: defaultdict_count_word.py
# Author: meizhaohui
def count_words(article):
# replace \n to space,then split to list
article_list = article.replace('\n',' ').split()
counts = {}
for word in article_list:
if word not in counts:
counts[word] = 1
else:
counts[word] += 1
print(counts)
if __name__ == '__main__':
article='''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.
* namedtuple factory function for creating tuple subclasses with named fields
* deque list-like container with fast appends and pops on either end
* ChainMap dict-like class for creating a single view of multiple mappings
* Counter dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
* UserDict wrapper around dictionary objects for easier dict subclassing
* UserList wrapper around list objects for easier list subclassing
* UserString wrapper around string objects for easier string subclassing
'''
count_words(article)
运行:
$ python defaultdict_count_word.py
{'This': 1, 'module': 1, 'implements': 1, 'specialized': 1, 'container': 2, 'datatypes': 1, 'providing': 1, 'alternative
s': 1, 'to': 2, "Python's": 1, 'general': 1, 'purpose': 1, 'built-in': 1, 'containers,': 1, 'dict,': 1, 'list,': 1, 'set
,': 1, 'and': 2, 'tuple.': 1, '*': 9, 'namedtuple': 1, 'factory': 2, 'function': 2, 'for': 6, 'creating': 2, 'tuple': 1,
'subclasses': 1, 'with': 2, 'named': 1, 'fields': 1, 'deque': 1, 'list-like': 1, 'fast': 1, 'appends': 1, 'pops': 1, 'o
n': 1, 'either': 1, 'end': 1, 'ChainMap': 1, 'dict-like': 1, 'class': 1, 'a': 2, 'single': 1, 'view': 1, 'of': 1, 'multi
ple': 1, 'mappings': 1, 'Counter': 1, 'dict': 4, 'subclass': 3, 'counting': 1, 'hashable': 1, 'objects': 4, 'OrderedDict
': 1, 'that': 2, 'remembers': 1, 'the': 1, 'order': 1, 'entries': 1, 'were': 1, 'added': 1, 'defaultdict': 1, 'calls': 1
, 'supply': 1, 'missing': 1, 'values': 1, 'UserDict': 1, 'wrapper': 3, 'around': 3, 'dictionary': 1, 'easier': 3, 'subcl
assing': 3, 'UserList': 1, 'list': 2, 'UserString': 1, 'string': 2}
- 使用defaultdict,不需要对键进行判断,直接添加。
代码如下:
# Filename: defaultdict_count_word.py
# Author: meizhaohui
def count_words(article):
from collections import defaultdict as dt
# replace \n to space,then split to list
article_list = article.replace('\n',' ').split()
# counts = {}
counts = dt(int)
for word in article_list:
# if word not in counts:
# counts[word] = 1
# else:
# counts[word] += 1
counts[word] += 1
print(counts)
if __name__ == '__main__':
article='''This module implements specialized container datatypes providing
alternatives to Python's general purpose built-in containers, dict,
list, set, and tuple.
* namedtuple factory function for creating tuple subclasses with named fields
* deque list-like container with fast appends and pops on either end
* ChainMap dict-like class for creating a single view of multiple mappings
* Counter dict subclass for counting hashable objects
* OrderedDict dict subclass that remembers the order entries were added
* defaultdict dict subclass that calls a factory function to supply missing values
* UserDict wrapper around dictionary objects for easier dict subclassing
* UserList wrapper around list objects for easier list subclassing
* UserString wrapper around string objects for easier string subclassing
'''
count_words(article)
运行:
$ python defaultdict_count_word.py
defaultdict(<class 'int'>, {'This': 1, 'module': 1, 'implements': 1, 'specialized': 1, 'container': 2, 'datatypes': 1, 'providing': 1, 'alternatives': 1, 'to': 2, "Python's": 1, 'general': 1, 'purpose': 1, 'built-in': 1, 'containers,': 1, 'dict,': 1, 'list,': 1, 'set,': 1, 'and': 2, 'tuple.': 1, '*': 9, 'namedtuple': 1, 'factory': 2, 'function': 2, 'for': 6, 'creating': 2, 'tuple': 1, 'subclasses': 1, 'with': 2, 'named': 1, 'fields': 1, 'deque': 1, 'list-like': 1, 'fast': 1, 'appends': 1, 'pops': 1, 'on': 1, 'either': 1, 'end': 1, 'ChainMap': 1, 'dict-like': 1, 'class': 1, 'a': 2, 'single': 1, 'view': 1, 'of': 1, 'multiple': 1, 'mappings': 1, 'Counter': 1, 'dict': 4, 'subclass': 3, 'counting': 1, 'hashable': 1, 'objects': 4, 'OrderedDict': 1, 'that': 2, 'remembers': 1, 'the': 1, 'order': 1, 'entries': 1, 'were': 1, 'added': 1, 'defaultdict': 1, 'calls': 1, 'supply': 1, 'missing': 1, 'values': 1, 'UserDict': 1, 'wrapper': 3, 'around': 3, 'dictionary': 1, 'easier': 3, 'subclassing': 3, 'UserList': 1, 'list': 2, 'UserString': 1, 'string': 2})
上面示例中defaultdict使用int给不存在的键设定默认值为int类型的默认值0,counts[word] += 1 实质上是先给counts[word]赋值0,遇到重复的单词的话就加1。使用这种方式不需要再进行判断。
注:上面的例子并没有对标点符号进行再进一步的处理,只是粗略的计算了一下单词量。
defaultdict可以使用int,list,dict等的默认值作为期字典缺省默认值。
常用内建模块之正则表达式re模块¶
目录
python 正则表达式re模块介绍¶
正则表达式,英语:Regular Expression,在代码中常简写为regex、regexp或RE。
python中re模块是正则表达式模块,使用方法如下:
import re # 导入re模块
>>> re.
re.A re.L re.Scanner( re.X re.findall( re.search( re.template(
re.ASCII re.LOCALE re.T re.compile( re.finditer( re.split(
re.DEBUG re.M re.TEMPLATE re.copyreg re.fullmatch( re.sre_compile
re.DOTALL re.MULTILINE re.U re.enum re.functools re.sre_parse
re.I re.RegexFlag( re.UNICODE re.error( re.match( re.sub(
re.IGNORECASE re.S re.VERBOSE re.escape( re.purge( re.subn(
re.match(pattern, string, flags=0) # 在字符串string开始处匹配pattern正则表达式规则,flags为匹配模式
# 并返回一个匹配对应MatchObject,未匹配到则返回None。
# 限制是只匹配字符串开头,其他位置不匹配,使用不方便。
flags匹配模式可以是以下模块:
re.A或re.ASCII # 特殊的ASCII码可被解析匹配,如 \w, \W, \b, \B, \d, \D, \s and \S
re.I或re.IGNORECASE # 匹配时忽略大小写,如a与A等价
re.M或re.MULTILINE # 多行匹配,也即并不是只在第一行匹配
# 可理解为当存在^$匹配符,可匹配每行的行首或行尾
# 如果re.MULTILINE选项被指定,则会匹配字符串中换行符后面的位置
re.X或re.VERBOSE # 允许在正则表达式中使用注释,使得正则表达式更易读。模式中空格会被忽略,#井号后面的部分作为注释会被忽略。
re.L或re.LOCALE # 本地化,该模式仅用于字节模式,不鼓励使用。
re.DEBUG # 查看正则表达式的匹配过程
re.S或re.DOTALL # 使.点号匹配包括换行在内的所有字符
re.U或re.UNICODE # 使用统一码标志,python3中默认使用UNICODE进行匹配,不需要添加。
re.compile(pattern, flags=0) # 创建正则表达式模式对象;
# pattern为正则表达式匹配规则,flags为匹配模式。
# 当一个正则表达式需要多次匹配或更加复杂的匹配时,
# 使用compile对模式进行编译,可加快匹配速度。
# 对编译好的模式进行匹配。
re.fullmatch(pattern, string, flags=0) # 如果模式完全匹配字符串,则返回一个matchobject,不匹配则返回None
re.search(pattern, string, flags=0) # 将字符串的所有字符尝试与正则表达式匹配,如果匹配成功,返回matchobject,否则返回None。
# 若有多个匹配成功,只返回第一个匹配结果
re.split(pattern, string, maxsplit=0, flags=0) # 通过正则表达式将字符串分割,匹配到符合的字符就把字符串分割一次;
# maxsplit没有赋值,找到几个匹配项就分割几次;
# 若maxsplit赋值小于匹配项个数,则分割maxsplit次。返回一个list。
re.findall(pattern, string, flags=0) # 查找整个字符串,返回所有的匹配项,返回一个列表list。
re.finditer(pattern, string, flags=0) # 查找整个字符串,返回所有的匹配项,返回一个迭代器callable_iterator对象。
re.sub(pattern, repl, string, count=0, flags=0) # 把string中所有符合pattern的字符串,替换成repl;
# count如赋值小于匹配项个数,则把前count个匹配项替换掉,其他字符不变。
# re.sub返回完成替换之后的字符串。
re.subn(pattern, repl, string, count=0, flags=0) # 把string中所有符合pattern的字符串,替换成repl;
# count如赋值小于匹配项个数,则把前count个匹配项替换掉,其他字符不变。
# re.subn返回元组,(完成替换之后的字符串,替换次数)。
re.escape(string) # 对字符串中除字母数字下划线外,其他所有字符串进行转义,都加上反斜杠。
re.purge() # 清空正则表达式的缓存
使用re模块的步骤¶
我们有必要对re模块中所包含的类及其工作流程进行一下简单的、整体性的说明,这讲有利于我们对下面内容的理解。
使用re模块进行正则匹配操作的步骤:
- 编写表示正则表达式规则的Python字符串str;
- 通过re.compile()函数编译该Python字符串获得一个正则表达式对象(Pattern Object)p;
- 通过正则表达式对象的p.match()或p.fullmatch()函数获取匹配结果–匹配对象(Match Object)m;
- 通过判断匹配对象m是否为空可知是否匹配成功,也可以通过匹配对象m提供的方法获取匹配内容。
使用re模块进行内容查找、替换和字符串分隔操作的步骤:
- 编写表示正则表达式规则的Python字符串str;
- 通过re.compile()函数编译该Python字符串获得一个正则表达式对象(Pattern Object)p;
- 通过正则表达式对象的p.search()或p.findall()或p.finditer()或p.sub()或p.subn()或p.split()函数完内容查找、替换和字符串分隔操作并获取相应的操作结果;
总结:
- 根据上面的描述可知,将一个表示正则表达式的Python字符串编译成一个正则表达式对象是使用正则表达式完成相应功能的首要步骤.
- re模块中用于完成正则表达式编译功能的函数为re.compile()。
正则表达式中特殊的字符¶
正则表达式中特殊的字符:
. 点号,在默认模式下,匹配除换行以外的任意字符。如果 DOTALL 标志被指定, 则匹配包括换行符在内的所有字符。
^ 乘方运算符或脱字节符,在默认模式下匹配字符串的起始位置,在MULTILINE模式下也匹配换行符之后的位置。
$ 匹配字符串的末尾或者字符串末尾换行符之前的位置,在MULTILINE模式下还匹配换行符之前的位置。
* 匹配前面重复出现的正则表达式零次或多次,尽可能多的匹配(greedy 贪婪型)。
+ 匹配前面RE 1次或多次(贪婪型,尽可能多的匹配)。
? 匹配前面的RE 0次或1次。
*?,+?,?? '*'、'+'和'?'限定符是贪婪的;它们匹配尽可能多的文本。在限定符之后加上'?'将使得匹配以非贪婪的或最小的方式进行。
{m} 表示精确匹配前面的正则表达式的m个拷贝,较少的匹配将导致整个表达式不能匹配。
{m,n} 匹配前导正则表达式的m到n个重复,尝试匹配尽可能多的重复(greedy 贪婪型)。
{m,} 匹配前导正则表达式的至少m次,尝试匹配尽可能多的重复(greedy 贪婪型)。
{,n} 匹配前导正则表达式的至多n次,尝试匹配尽可能多的重复(greedy 贪婪型)。
{m,n}? 匹配前导正则表达式的m到n个重复,尝试匹配尽可能少的重复(Non-greedy 非贪婪型)。
\ 对特殊符号进行转义
[] 用来表示一个字符集合。
字符可以一个一个的列出来,如[abcd],则可以匹配'a','b','c','d'。
通过给出两个字符并用'-'分隔,可以给出一段范围的字符,如[a-z]匹配小写字母,[A-Z]匹配大写字母,[0-9]匹配0-9的数字。
在集合内部,特殊字符将失去它们特殊的含义,如[(+*)]将匹配'(','+','*',')'。
在集合中接受字符类别\s,\S,\w等。
可以使用[^RE]作为字符集的补集,^必须为集合第一个字符,如[^a-z]可以匹配除小写字母外所有的字符。
| a|b 匹配a或b,(Non-greedy 非贪婪型),匹配上正则a后,就不会再去尝试匹配正则b。
(...) 被圆括号括起来的表达式将作为分组,分组表达式作为一个整体,后面可以接数量词,表达式中|仅在该组中有效。
如(a-z|A-Z){2,3}表示匹配字母2至3次。
(?aiLmsux) 给整个正则表达式设置相应的标记:re.A(ASCII码模式),re.I(忽略大小写),re.L(依赖区域设置);
re.M(多行模式),re.S(点号匹配所有字符),re.U(依赖Unicode),re.X(详细模式)
(?:...) # 当你要将一部分规则作为一个整体对它进行某些操作,可以使用(?:RE)将正则表达式RE包裹起来。
(?P<name>...) # 将RE字符串包裹进来作为一个命名组。
(?P=name) # 使用命名组进行匹配。匹配前面定义的命名组匹配到的字符串。
(?#...) # 添加备注,忽略指定的字符。
(?='...') # 如果指定的字符在匹配到的字符后面,才算匹配成功。s='Isaac Asimov' m=re.findall("Isaac (?=Asimov)",s)
(?!...) # 如果指定的字符不在匹配到的字符后面,才算匹配成功。s='Isaac Asimov' m=re.findall("Isaac (?!Asimov)",s)
(?<=...) # 如果指定的字符在匹配到的字符前面,才算匹配成功。s='Isaac Asimov' m=re.findall("(?<=Isaac )Asimov",s)
(?<!...) # 如果指定的字符不在匹配到的字符前面,才算匹配成功。s='Isaac Asimov' m=re.findall("(?<!Isaac )Asimov",s)
(?(id/name)yes|no) #选择性匹配 (?(id/name)yes-pattern|no-pattern) 的作用是:
对于给出的id或者name,先尝试去匹配 yes-pattern部分的内容;
如果id或name条件不满足,则去匹配no-pattern部分的内容;no-pattern部分可以省略;
此处的name或id,是针对(当前位置的)条件性匹配之前的,某个已经通过group去分组的内容
如果是有命名的分组,即named group,则对应的该分组就有对应的name,即此处所指的就是对应的name;
如果是无命名的分组,即unnamed group,则对应的该分组也有对应的分组的编号,称为group的number,
也叫做id,对应的就是这里的id。
*** 预定义字符集
\\ 匹配反斜杠
\A 匹配字符串开头,同^
\Z 匹配字符串结尾,同$
\number 匹配相同编号的组的内容
\b 匹配空字符串,仅在词的开头和结尾
\B 匹配空字符串,不在词的开头和结尾,与\b相反
\d 匹配数字,等同于[0-9]
\D 匹配非数字,等同于\d的补集,即[^\d]
\s 匹配whitespace字符串,同等于[ \t\n\r\f\v]
\S 匹配非whitespace字符串,\s的补集,[^\s]
\w 匹配字母,数字,下划线,等同于[a-zA-Z0-9_]
\W \w的补集
正则表达式示例如下。
使用re.match匹配首字符”I”¶
使用re.match匹配首字符”I”:
>>> string='i love to learn python. I am a Pythonista!'
>>> string
'i love to learn python. I am a Pythonista!'
>>> pattern=r"I"
>>> pattern
'I'
>>> p=re.compile(pattern)
>>> p
re.compile('I')
>>> m=re.match(p,string)
>>> m # 未匹配到,因为首字符是小写的"i"
# 增加忽略大小写的flag re.IGNORECASE
>>> p=re.compile(pattern,re.IGNORECASE)
>>> p
re.compile('I', re.IGNORECASE)
>>> m=re.match(p,string)
>>> m
<_sre.SRE_Match object; span=(0, 1), match='i'> # 成功匹配上小写字母"i"。
# 修改正则表达式尝试去匹配大小的"I"
>>> pattern=r".*I"
>>> pattern
'.*I'
>>> p=re.compile(pattern)
>>> p
re.compile('.*I')
>>> re.match(p,string)
<_sre.SRE_Match object; span=(0, 25), match='i love to learn python. I'>
使用fullmatch进行完全匹配¶
使用fullmatch进行完全匹配:
>>> re.fullmatch(r".*I.*!",string) # 匹配任意字符,"I",任意字符,"!"形成的字符串
<_sre.SRE_Match object; span=(0, 42), match='i love to learn python. I am a Pythonista!'>
使用re.search进行搜索匹配¶
使用re.search进行搜索匹配:
>>> pattern=r"I"
>>> p=re.compile(pattern,re.I) # 带忽略大小写flag,仅返回第一次匹配到的小写"i"
>>> re.search(p,string)
<_sre.SRE_Match object; span=(0, 1), match='i'>
>>> p=re.compile(pattern) # 不带忽略大小写flag,匹配到大写"I"
>>> re.search(p,string)
<_sre.SRE_Match object; span=(24, 25), match='I'>
使用re.findall进行搜索匹配¶
使用re.findall进行搜索匹配:
>>> string
'i love to learn python. I am a Pythonista!'
>>> pattern
'I'
>>> re.findall(pattern,string) # 不带忽略大小写flag,匹配到大写"I"
['I']
>>> re.findall(pattern,string,re.I) # 带忽略大小写flag,返回所有匹配到的小写"i"或大小"I"
['i', 'I', 'i']
re.finditer进行搜索匹配¶
使用re.finditer进行搜索匹配,返回callable_iterator对象:
>>> m=re.finditer(pattern,string,re.I)
>>> m
<callable_iterator object at 0x000002DB1506D470>
>>> for i in m:
... print(i)
...
<_sre.SRE_Match object; span=(0, 1), match='i'>
<_sre.SRE_Match object; span=(24, 25), match='I'>
<_sre.SRE_Match object; span=(37, 38), match='i'>
字符串分割¶
使用re.split(pattern, string, maxsplit=0, flags=0)进行字符分割:
>>> string
'i love to learn python. I am a Pythonista!'
>>> pattern
'I'
>>> re.split(pattern,string) # 直接进行分割,分割了1次
['i love to learn python. ', ' am a Pythonista!']
>>> re.split(pattern,string,flags=re.I) # 带flags 忽略大小写进行分割,分割了3次
['', ' love to learn python. ', ' am a Python', 'sta!']
>>> re.split(pattern,string,2,re.I) # 带flags 忽略大小写,并指定最多分割2次进行分割,分割了2次
['', ' love to learn python. ', ' am a Pythonista!']
>>> re.split(pattern,string,maxsplit=2,flags=re.I) # 带flags 忽略大小写,并指定最多分割2次进行分割,分割了2次
['', ' love to learn python. ', ' am a Pythonista!']
字符串替换¶
使用re.sub(pattern, repl, string, count=0, flags=0)对字符串进行替换:
>>> string
'i love to learn python. I am a Pythonista!'
>>> pattern
'I'
>>> re.sub(pattern,'i',string) # 将大写"I"替换成小写"i"
'i love to learn python. i am a Pythonista!'
>>> re.sub(pattern,'MEI',string,flags=re.I) # 带flags 忽略大小写,将匹配字符替换成"MEI",共替换3处
'MEI love to learn python. MEI am a PythonMEIsta!'
>>> re.sub(pattern,'MEI',string,count=2,flags=re.I) # 带flags 忽略大小写,将匹配字符替换成"MEI",并指定只替换2次,共替换2处
'MEI love to learn python. MEI am a Pythonista!'
# 显示替换次数
>>> re.subn(pattern,'MEI',string,count=2,flags=re.I)
('MEI love to learn python. MEI am a Pythonista!', 2)
>>> re.subn(pattern,'MEI',string,count=1,flags=re.I)
('MEI love to learn python. I am a Pythonista!', 1)
>>> re.subn(pattern,'MEI',string,flags=re.I)
('MEI love to learn python. MEI am a PythonMEIsta!', 3)
字符转义¶
使用re.escape将所有字符转义:
>>> re.escape(string)
'i\\ love\\ to\\ learn\\ python\\.\\ I\\ am\\ a\\ Pythonista\\!'
匹配行首行尾¶
匹配行首行尾:
>>> re.search('^i.*!$',string)
<_sre.SRE_Match object; span=(0, 42), match='i love to learn python. I am a Pythonista!'>
>>> re.findall('^i.*!$',string)
['i love to learn python. I am a Pythonista!']
重复匹配¶
重复匹配:
# 贪婪型匹配:
>>> str='abbcccddddeeeee'
>>> re.search('ab+',str)
<_sre.SRE_Match object; span=(0, 3), match='abb'>
>>> re.search('ab*',str)
<_sre.SRE_Match object; span=(0, 3), match='abb'>
>>> re.search('ab?',str)
<_sre.SRE_Match object; span=(0, 2), match='ab'>
# 非贪婪型匹配:
>>> re.search('ab??',str)
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.search('ab*?',str)
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.search('ab+?',str)
<_sre.SRE_Match object; span=(0, 2), match='ab'>
# 匹配1次,匹配2次,或匹配m至n次,贪婪型匹配:
>> re.search('ab{1}',str)
<_sre.SRE_Match object; span=(0, 2), match='ab'>
>>> re.search('ab{2}',str)
<_sre.SRE_Match object; span=(0, 3), match='abb'>
>>> re.search('ab{3}',str)
>>> re.search('ab{1,3}',str)
<_sre.SRE_Match object; span=(0, 3), match='abb'>
>>> re.search('ab{1,2}',str)
<_sre.SRE_Match object; span=(0, 3), match='abb'>
>>> re.search('ab{0,2}',str)
<_sre.SRE_Match object; span=(0, 3), match='abb'>
# 匹配1次,匹配2次,或匹配m至n次,非贪婪型匹配:
>>> re.search('ab{0,2}?',str)
<_sre.SRE_Match object; span=(0, 1), match='a'>
>>> re.search('ab{1,2}?',str)
<_sre.SRE_Match object; span=(0, 2), match='ab'>
系列匹配¶
使用[]匹配系列:
>>> string="123abc456def789ABC021"
# 查找数字
>>> re.findall('[0-9]',string)
['1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '2', '1']
# 查找小写字母
>>> re.findall('[a-z]',string)
['a', 'b', 'c', 'd', 'e', 'f']
# 查找大写字母
>>> re.findall('[A-Z]',string)
['A', 'B', 'C']
# 查找大小写字母
>>> re.findall('[a-zA-Z]',string)
['a', 'b', 'c', 'd', 'e', 'f', 'A', 'B', 'C']
# 匹配非字母
>>> re.findall('[^a-zA-Z]',string)
['1', '2', '3', '4', '5', '6', '7', '8', '9', '0', '2', '1']
12)、使用|进行或匹配
>>> re.findall('[123]|[abc]',string)
['1', '2', '3', 'a', 'b', 'c', '2', '1']
分组匹配¶
分组匹配:
>>> string
'123abc456def789ABC021'
>>> re.findall('([0-9])([a-z])',string)
[('3', 'a'), ('6', 'd')]
>>> re.findall('([0-9])([a-zA-Z])',string)
[('3', 'a'), ('6', 'd'), ('9', 'A')]
>>> re.search('([0-9])([a-zA-Z])',string)
<_sre.SRE_Match object; span=(2, 4), match='3a'>
>>> re.findall('([0-9][a-zA-Z])',string)
['3a', '6d', '9A']
正则标记¶
使用(?aiLmsux)设置正则标记:
>>> re.findall('(?i)(ab)',string)
['ab', 'AB']
>>> re.findall('(?x)[ab]',string)
['a', 'b']
>>> re.findall('(?i)[ab]',string)
['a', 'b', 'A', 'B']
命名组匹配¶
使用(?<name>)和(?P=name)进行命名组匹配:
>>> string='123abc123def456ghi456ki'
>>> string
'123abc123def456ghi456ki'
# 使用(?P=num)匹配num组匹配到的字符串
>>> re.findall("(?P<num>\d+)[a-z]*(?P=num)",string)
['123', '456']
>>> string=u'标签:<a href="/tag/情侣电话粥/">情侣电话粥</a>'
>>> re.search("<a href=\"/tag/(?P<name>.*)/\">(?P=name)</a>",string)
<_sre.SRE_Match object; span=(3, 34), match='<a href="/tag/情侣电话粥/">情侣电话粥</a>'>
>>> re.findall("<a href=\"/tag/(?P<name>.*)/\">(?P=name)</a>",string)
['情侣电话粥']
添加备注信息¶
添加备注信息:
>>> import re
>>> string='abdDeFgh'
# 匹配小写字母
>>> re.findall('[a-z]*(?#lower)',string)
['abd', '', 'e', '', 'gh', '']
>>> re.search('[a-z]*(?#lower)',string)
<_sre.SRE_Match object; span=(0, 3), match='abd'>
# 匹配大写字母
>>> re.search('[A-Z]*(?#upper)',string)
<_sre.SRE_Match object; span=(0, 0), match=''>
>>> re.findall('[A-Z]*(?#upper)',string)
['', '', '', 'D', '', 'F', '', '', '']
限定字符匹配¶
在需要匹配的字符前后有限定字符:
(?='...') # 如果指定的字符在匹配到的字符后面,才算匹配成功。s='Isaac Asimov' m=re.findall("Isaac (?=Asimov)",s)
(?!...) # 如果指定的字符不在匹配到的字符后面,才算匹配成功。s='Isaac Asimov' m=re.findall("Isaac (?!Asimov)",s)
(?<=...) # 如果指定的字符在匹配到的字符前面,才算匹配成功。s='Isaac Asimov' m=re.findall("(?<=Isaac )Asimov",s)
(?<!...) # 如果指定的字符不在匹配到的字符前面,才算匹配成功。s='Isaac Asimov' m=re.findall("(?<!Isaac )Asimov",s)
>>> s='Isaac Asimov'
>>> s
'Isaac Asimov'
# 在'Isaac '之后有'Asimov'字符,匹配到
>>> re.findall("Isaac (?=Asimov)",s)
['Isaac ']
# 在'Isaac '之后没有'Asimov'字符,未匹配到
>>> re.findall("Isaac (?!Asimov)",s)
[]
# 在'Isaac '之后没有'Asimoev'字符,匹配到
>>> re.findall("Isaac (?!Asimoev)",s)
['Isaac ']
# 在'Asimov'之前有'Isaac '字符,匹配到
>>> re.findall("(?<=Isaac )Asimov",s)
['Asimov']
# 在'Asimov'之前没有'Isaacd'字符,未匹配到
>>> re.findall("(?<=Isaacd)Asimov",s)
[]
# 在'Asimov'之前不能包含'Isaac '字符,未匹配到
>>> re.findall("(?<!Isaac )Asimov",s)
[]
# 在'Asimov'之前不包含'Isaacd'字符,匹配到
>>> re.findall("(?<!Isaacd)Asimov",s)
['Asimov']
选择性匹配¶
选择性匹配:
(?(id/name)yes|no) #选择性匹配
a、匹配邮箱
s='<user1@mail1> user2@mail2 <user3@mail3> <user4@mail4 user5@mail5> < user6@mail6 user7@mail7>'
# 多个邮箱地址有的被<>或空格包裹起来,要取出所有的邮箱地址。
# 通过分析可知:
# 1、如果邮箱前面有<,则需要在其后可能是>,如<user1@mail1> 或 <user3@mail3>
# 2、如果邮箱前面有<,则需要在其后可能是空格 ,如<user4@mail4 user5@mail5>中的user4@mail4
# 3、如果邮箱前面没有<,则需要在其后可能是>,如<user4@mail4 user5@mail5>中的user5@mail5
# 4、如果邮箱前面没有<,则需要在其后可能是空格 ,如user2@mail2
即在邮箱前面或后面有0个或多个空格字符
匹配邮箱:\w+@\w+ \w表示匹配字母,数字,下划线,等同于[a-zA-Z0-9_]
再匹配邮箱前后可能产生的空格字符,\s*(\w+@\w+)\s*
如果匹配到<,则后面需要匹配>或空格
如果未匹配到<,则后面需要匹配>或空格
所以正则表达式如下:
>>> re.findall(r'(<)?\s*(\w+@\w+)\s*(?(1)[> ]|[> ])',s)
[('<', 'user1@mail1'), ('', 'user2@mail2'), ('<', 'user3@mail3'), ('<', 'user4@mail4'), ('', 'user5@mail5'), ('<', 'user6@mail6'), ('', 'user7@mail7')]
b、匹配标准数字
以数字为例,标准数字格式如下,即:
# 所有位都是数字,如0-9
# 可以有小数点,如果有小数点的话,小数点后面有一至两个小数,如12.34合法,12.3合法,12.345不合法,12.不合法
# 不能包含字母,以及除.小数点号以外其他的特殊字符
# 最高位不能是0,0123不合法,123合法
匹配步骤:
# 1、匹配整数部分,[1-9]\d*,即起始位是1-9中的数字,后面可跟多位[0-9]间的数字
# 2、匹配小数点,\. 使用转义符\进行转义
# 3、匹配小数点后面的小数部分,\d{1,2},即匹配数字1至2次
# 4、如果匹配小数点,则要匹配小数后面的数字
使用选择性匹配的正则如下:
[1-9]\d*(\.)?(?(1)\d{1,2})
如果注意到数字前后再不能用其他字符,则在最前面和最后面分别加上^,$作一下限定:
^[1-9]\d*(\.)?(?(1)\d{1,2})$
如果将整数部分、小数点、小数部分进行分组。如下:
foundValidNumStr = re.search("^(?P<integerPart>[1-9]\d*)(?P<foundPoint>\.)?(?P<decimalPart>(?(foundPoint)\d{1,2}))$", eachNumStr)
详细可参考re_id_name.py
re_id_name.py代码如下:
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
【教程】详解Python正则表达式之: (?(id/name)yes-pattern|no-pattern) 条件性匹配
https://www.crifan.com/detailed_explanation_about_python_regular_express_yes_or_no_conditional_match
Version: 2012-11-17
Author: Crifan
"""
import re
#需求:
#类似于检测(最多两位小数的)数字的合法性:
#所有的字符都是数字
#如果有小数点,那么小数点后面最多2位数字
testNumStrList = {
#合法的数字
'12.34',
'123.4',
'1234',
#非法的数字
'1.234',
'12.',
'12.ab',
'12.3a',
'123abc',
'123abc456',
'01234',
}
for eachNumStr in testNumStrList:
# eachNumStr='1.234'
#下面这个是不严谨的,会导致:
#1.234 -> 只会去判断234,所以检测出整数部分是234,无小数
#123.4 -> 只会去判断4,所以检测出整数部分是4,无小数
#123abc456 -> 只会去判断456,所以检测出整数部分是456,无小数
#foundValidNumStr = re.search("(?P<integerPart>\d+)(?P<foundPoint>\.)?(?P<decimalPart>(?(foundPoint)\d{1,2}))$", eachNumStr)
#下面这个也是不严谨的,会导致:
#1.234 -> 只去判断1.23,所以检测出整数是1,小数是23
#12. -> 只会去判断12,所以检测出整数是12,无小数
#123abc456 -> 只会去判断123,所以检测出整数是123,无小数
#12.ab -> 只会去判断12,所以检测出整数是12,无小数
#123abc -> 只会去判断123,所以检测出整数是123,无小数
#12.3a -> 只会去判断12.3,所以检测出整数是12,小数是3
#foundValidNumStr = re.search("^(?P<integerPart>\d+)(?P<foundPoint>\.)?(?P<decimalPart>(?(foundPoint)\d{1,2}))", eachNumStr)
#下面这个,更不严谨,会导致中间只要有数字,那么基本上都会去匹配到,和实际的期望,差距最大
#foundValidNumStr = re.search("(?P<integerPart>\d+)(?P<foundPoint>\.)?(?P<decimalPart>(?(foundPoint)\d{1,2}))", eachNumStr)
#下面这个才是正确的
foundValidNumStr = re.search("^(?P<integerPart>[1-9]\d*)(?P<foundPoint>\.)?(?P<decimalPart>(?(foundPoint)\d{1,2}))$", eachNumStr)
#也可以写成下面这样:
#foundValidNumStr = re.search("^(?P<integerPart>\d+)(\.)?(?P<decimalPart>(?(2)\d{1,2}))$", eachNumStr); #这个也是同样的效果
#print "foundValidNumStr=",foundValidNumStr;
if(foundValidNumStr):
integerPart = foundValidNumStr.group("integerPart")
decimalPart = foundValidNumStr.group("decimalPart")
print("eachNumStr=%s\tis valid numebr ^_^, integerPart=%s, decimalPart=%s"%(eachNumStr, integerPart, decimalPart))
else:
print("eachNumStr=%s\tis invalid number !!!"%(eachNumStr))
执行re_id_name.py运行结果如下:
eachNumStr=123abc456 is invalid number !!!
eachNumStr=12.3a is invalid number !!!
eachNumStr=12.ab is invalid number !!!
eachNumStr=123abc is invalid number !!!
eachNumStr=01234 is invalid number !!!
eachNumStr=1234 is valid numebr ^_^, integerPart=1234, decimalPart=
eachNumStr=12. is invalid number !!!
eachNumStr=123.4 is valid numebr ^_^, integerPart=123, decimalPart=4
eachNumStr=1.234 is invalid number !!!
eachNumStr=12.34 is valid numebr ^_^, integerPart=12, decimalPart=34
使用特殊字符进行查找¶
使用特殊字符进行查找:
>>> import string
>>> s=string.printable
>>> s
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
# 匹配开头字符0123
>>> re.findall('\A0123',s)
['0123']
# 匹配结尾字符\x0c
>>> re.findall('\x0c\Z',s)
['\x0c']
# 匹配数字
>>> re.findall('\d',s)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
# 匹配whitespace字符串
>>> re.findall('\s',s)
[' ', '\t', '\n', '\r', '\x0b', '\x0c']
# 匹配非whitespace字符串
>>> re.findall('\S',s)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '_', '`', '{', '|', '}', '~']
# 匹配字母,数字,下划线,等同于[a-zA-Z0-9_]
>>> re.findall('\w',s)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_']
# 匹配\w的补集
>>> re.findall('\W',s)
['!', '"', '#', '$', '%', '&', "'", '(', ')', '*', '+', ',', '-', '.', '/', ':', ';', '<', '=', '>', '?', '@', '[', '\\', ']', '^', '`', '{', '|', '}', '~', ' ', '\t', '\n', '\r', '\x0b', '\x0c']
参考文献:
- Python 正则表达式入门(初级篇) https://www.cnblogs.com/chuxiuhong/p/5885073.html
- Python 正则表达式入门(中级篇) http://www.cnblogs.com/chuxiuhong/p/5907484.html
- python中的正则表达式(re模块) https://www.cnblogs.com/tina-python/p/5508402.html
- python官网指导 https://docs.python.org/3.6/library/re.html
- Python::re模块–在Python中使用正则表达式 https://www.cnblogs.com/now-fighting/p/4495841.html
- Python之正则表达式(re模块) https://www.cnblogs.com/yyds/p/6953348.html
- python 正则表达式 RE模块汇总记录 https://www.cnblogs.com/congyinew/p/6491268.html
- python re模块 https://www.cnblogs.com/MrFiona/p/5954084.html
- 详解Python正则表达式之: (?(id/name)yes-pattern|no-pattern) 条件性匹配 https://www.crifan.com/detailed_explanation_about_python_regular_express_yes_or_no_conditional_match/
- 以Python中的re模块为例,手把手教你,如何从无到有,写出相对复杂的正则表达式 https://www.crifan.com/how_to_write_your_own_complex_regular_expression_in_python_re/
面向对象编程¶
目录
面向对象编程基础¶
两种范型:
- 以指令为核心:围绕”正在发生什么”进行编写;面向过程编程:程序具有一系列的线性步骤,主体思想是代码作用于数据。
- 以数据为核心:围绕”将影响谁”进行编写;面向对象编程(OOP,object_oriented_programming):围绕数据及为数据严格定义的接口来组织程序,用数据控制对代码的访问。
基本概念¶
- 类创建一个新类型,而对象则是类的实例;类和对象是面向对象编程的两个主要方面。
- 对象或类的变量称为域;类的函数称为类的方法;域和方法合称为类的属性。
- 域有两种类型:实例变量,类变量。
- 类使用class关键字创建,如class ClassName:
- 类名使用大写字母开头的单词,如CapWords,ClassName。
- 方法的第一个参数一定是self,表示对象本身。
- 方法用关键字def定义,如def method_name(self[,keyword]):
- __init__方法在类的实例创建时,马上运行,可以对对象进行初始化。
类的示例:
#!/usr/bin/python3
"""
@Time : 2019/3/31
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@Filename: person.py
@Software: PyCharm
@Desc : basic class
"""
class Person:
"""say hello & count the num""" # 定义类的docstring
num = 0 # 定义类变量
# 定义类的方法
# 初始化类
def __init__(self, name):
"""initializes the person's data""" # 定义方法的docstring
self.name = name # 定义实例变量
print('(Initizlizing %s)' % self.name)
Person.num += 1 # 对类的变量进行操作
def say_hi(self):
"""Say Hello to someone"""
print('Hello,%s,how are you?' % self.name)
def print_all(self):
"""Count the sum"""
print('Sum is:%s' % Person.num)
def main():
"""main function"""
print(Person.__doc__)
print(Person.say_hi.__doc__)
person1 = Person('Hanmeimei')
person1.say_hi()
person1.print_all()
person2 = Person('Lilei')
person2.say_hi()
person2.print_all()
if __name__ == '__main__':
main()
输出结果如下:
say hello & count the num
Say Hello to someone
(Initizlizing Hanmeimei)
Hello,Hanmeimei,how are you?
Sum is:1
(Initizlizing Lilei)
Hello,Lilei,how are you?
Sum is:2
面向对象编程的原则¶
面向对象的模型机制有3个原则:封装、继承和多态
- 封装(Encapsulation)
- 隐藏实现方案细节;
- 将代码及其处理的数据绑定在一起的一种编程机制,用于保证程序和数据不受外部干扰且不会被误用。
- 继承(Inheritance)
- 一个对象获得另一个对象属性的过程;用于实现按层分类的概念
- 一个深度继承的子类继承了类层次中它的每个祖先的所有属性
- 如果某些类具有相同的属性,可以将这些属性提取出来,构建一个父类,然后使用子类继承父类
- 子类会继承父类的方法,子类会自动获取父类的所有方法
- 子类也可以覆盖(override)的方法,也可以添加父类中没有的方法
- 在子类中,可以使用super()方法获取父类的定义
- 在子类中父类的初始化方法并不会自动调用,必须显示调用它,可以使用如super().__init__(name)来进行调用
- 使用super()方法时,不用传入self,只用传入其他参数即可,如name
- 在子类中覆盖父类的__init__构造方法时,在子类中父类的构造方法并不会自动调用,必须使用super().__init__(arg)显示调用父类的构造方法
- 多态(Polymorphism)
- 一个子类型在任何需要父类型的场合可以被替换成父类型,即对象可以被视作是父类的实例,这种现象称为多态形象。
示例:
#!/usr/bin/python3
"""
@Time : 2019/3/31
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@Filename: class_inheritance.py
@Software: PyCharm
@Desc : Class Inheritance
使用一个程序来记录学校的教师和学生情况
教师和学生有一些共同属性,如姓名、年龄;
教师有专有属性,如薪水、课程;
学生有专有属性,如班级、学费。
创建一个共同的类SchoolMember,称为父类或超类,然后让教师和学生的类继承这个公共的类;
教师使用Teacher类,称为子类,继承SchoolMember类;
学生使用Student类,称为子类,继承SchoolMember类;
"""
class SchoolMember:
"""父类,基础类SchoolMember"""
def __init__(self, name, age):
"""父类构造方法"""
self._name = name # 定义内部变量
self._age = age # 定义内部变量
print("(Initialized SchoolMember: %s)" % self._name)
def tell(self):
"""打印详情"""
print("Name is:%s \nAge is:%s" % (self._name, self._age))
class Teacher(SchoolMember):
"""子类Teacher,继承父类SchoolMember"""
def __init__(self, name, age, salary):
"""子类覆盖父类构造方法,新增一个salary参数"""
super().__init__(name, age) # 显式调用父类super()方法与使用上一行的代码等价,此时不用加self参数,子类构造方法会自动将self参数传递给父类
self._salary = salary
print("(Initialized Teacher: %s)" % self._name)
def tell(self):
"""子类覆盖父尖方法"""
super().tell() # 调用父类的tell方法
print("Salary is:%s" % self._salary)
class Student(SchoolMember):
"""子类Student,继承父类SchoolMember"""
def __init__(self, name, age, fee):
"""子类覆盖父类构造方法,新增一个fee参数"""
SchoolMember.__init__(self, name, age)
self._fee = fee
print("(Initialized Teacher: %s)" % self._name)
def tell(self):
SchoolMember.tell(self) # 调用父类的tell方法,将Student作为父类SchoolMember的一个实例
print("Fee is:%s" % self._fee)
def main():
"""主方法"""
teacher1 = Teacher('John', 24, 10000)
teacher1.tell()
student1 = Student('Tim', 18, 7500)
student1.tell()
if __name__ == '__main__':
main()
运行结果如下:
(Initialized SchoolMember: John)
(Initialized Teacher: John)
Name is:John
Age is:24
Salary is:10000
(Initialized SchoolMember: Tim)
(Initialized Teacher: Tim)
Name is:Tim
Age is:18
Fee is:7500
说明: 示例中使用两种方法调用父类的方法,如方式1: super().__init__(name, age) ,方式2:SchoolMember.__init__(self, name, age),推荐使用方式1进行调用,这样就算修改父类的名称,子类的方法代码也不需要修改。
Python构造器__init__()方法¶
- 创建实例时,Python会自动调用类中的__init__方法,以隐性地为实例提供属性。
- __init__方法被称为构造器或构造方法。
- 如果类中没有定义__init__方法,实例创建时仅是一个简单的名称空间。
- 创建实例时,实例接收的参数会自动传送到构造器中。
如:
>>> class LoveLanguage:
... def __init__(self,name,lang):
... self.name=name
... self.lang=lang
... def tell(self):
... print("Your name is {} and you love to learn {}".format(self.name,self.lang))
...
>>> c1=LoveLanguage('mei','python')
>>> c1.tell()
Your name is mei and you love to learn python
命名空间¶
- python可以使用locals()和globals()获取局部或全局命名空间的字典。
- locals() # 返回局部命名空间内容的字典;
- globals() # 返回全局命名空间内容的字典。
如:
>>> def test(*args):
... data='test locals()'
... print(locals())
... print('args',args)
...
>>> test('a','b')
{'data': 'test locals()', 'args': ('a', 'b')}
args ('a', 'b')
>>> globals()
{'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>
, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, 'test': <function test at 0x0000000002A4D620>}
使用装饰器(decorator)定义属性的访问和设置¶
下面的例子中定义两个不同的方法,它们都叫name(),但包含不同的修饰符:
- @property,用于指示getter方法;
- @name.setter,用于指示setter方法。
- 使用__定义变量可以将名称重整,以保护私有特性,如__name。实际上名称被重整为_ClassName__name这样的。
print_name.py代码如下:
#!/usr/bin/python3
"""
@Time : 2019/3/31
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@Filename: print_name.py
@Software: PyCharm
@Desc : class property
"""
class PrintName:
"""print user name"""
def __init__(self, input_name):
"""构造方法"""
# 为了隐藏内部特性,可以使用两个下划线开头去定义内部隐藏变量,如(__name)
self.__name = input_name
@property # @property 用于指示getter方法
def name(self):
"""get the name attribute"""
print("inside the getter!")
return self.__name
@name.setter # @name.setter用于指示setter方法
def name(self, input_name):
"""set the name attribute"""
print("inside the setter!")
self.__name = input_name
def print_name(self):
"""print name"""
print("Your name is :", self.__name)
def main():
"""main function"""
pn_object1 = PrintName('mei')
print("获取名称:")
print(pn_object1.name)
print("重新设置名称:")
pn_object1.name = 'meichaohui'
print("重新获取名称:")
print(pn_object1.name)
print("使用print_name方法打印名称:")
pn_object1.print_name()
if __name__ == '__main__':
main()
运行print_name.py结果如下:
获取名称:
inside the getter!
mei
重新设置名称:
inside the setter!
重新获取名称:
inside the getter!
meichaohui
使用print_name方法打印名称:
Your name is : meichaohui
类方法(class method)与静态方法(static method)¶
- 在类的定义中,以self作为第一个参数的方法都是实例方法(instance method)。
- 实例方法在首个参数是self,当它被调用时,python会把调用该方法的对象作为self参数传入。
- 类方法(class method)作用于整个类,对类作出的任何改变会对它的所有实例对象产生影响。
- 在类定义内部,用前缀修饰符@classmethod指定的方法都是类方法。
- 与实例方法类似,类方法的第一个参数是类本身。在python中,这个参数常被写作cls,因为全称class是保留字。
- 静态方法,既不影响类也不影响类的对象。出现在类的定义中仅仅是为了方便。
- 静态方法(static method)用@staticmethod修饰符修饰,既不需要self参数也不需要class参数。
- 下面代码中的welcome方法是静态方法,sum方法是类方法。
class_static_method.py代码如下:
#!/usr/bin/python3
"""
@Time : 2019/3/31
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@Filename: class_static_method.py
@Software: PyCharm
@Desc : class method and static method
"""
class PrintName:
"""display the class method and static method"""
count = 0
def __init__(self, input_name):
PrintName.count += 1
# 为了隐藏内部特性,可以使用两个下划线开头去定义内部隐藏变量,如(__name)
self.__name = input_name
print("使用静态方法打印欢迎词:")
PrintName.welcome()
@property
# @property 用于指示getter方法
def name(self):
print("inside the getter!")
return self.__name
@name.setter
# @name.setter用于指示setter方法
def name(self, input_name):
print("inside the setter!")
self.__name = input_name
def print_name(self):
print("Your name is :", self.__name)
@classmethod
# @classmethod类方法,作用于整个类
def sum(cls):
print("The sum is", cls.count)
@staticmethod
def welcome():
print("Welcome to join us")
def main():
one_object = PrintName('mei')
print("获取名称:")
print(one_object.name)
print("重新设置名称:")
one_object.name = 'meizhaohui'
print("重新获取名称:")
print(one_object.name)
print("使用print_name方法打印名称:")
one_object.print_name()
print("使用类方法打印总人数:")
PrintName.sum()
print("=" * 50)
two_object = PrintName('kawaii')
print("获取名称:")
print(two_object.name)
print("使用类方法打印总人数:")
PrintName.sum()
print("=" * 50)
three_object = PrintName('Manu Ginóbili')
print("获取名称:")
print(three_object.name)
print("使用类方法打印总人数:")
PrintName.sum()
if __name__ == '__main__':
main()
运行class_static_method.py结果如下:
使用静态方法打印欢迎词:
Welcome to join us
获取名称:
inside the getter!
mei
重新设置名称:
inside the setter!
重新获取名称:
inside the getter!
meizhaohui
使用print_name方法打印名称:
Your name is : meizhaohui
使用类方法打印总人数:
The sum is 1
==================================================
使用静态方法打印欢迎词:
Welcome to join us
获取名称:
inside the getter!
kawaii
使用类方法打印总人数:
The sum is 2
==================================================
使用静态方法打印欢迎词:
Welcome to join us
获取名称:
inside the getter!
Manu Ginóbili
使用类方法打印总人数:
The sum is 3
何时使用类和对象而不是模块¶
当你需要许多具有相似行为(方法)但不同状态(特性)的实例时,使用对象是最好的选择。
类支持继承,但模块不支持。
如果你想要保证实例的唯一性,使用模块是最好的选择。不管模块在程序中被引用多少次,始终只有一个实例被加载。
如果你有一系列包含多个值的变量,并且它们能作为参数传入不同的函数,那么最好将它们封装到类里面:
举例:你可能会使用以size和color为键的字典代码一张彩色图片,你可以在程序中为每张图片创建不同的字典; 并把它们作为参数传递给像scale()或者transform()之类的函数。 但这么做的话,一旦你想要添加其他的键或者函数会变得非常麻烦。 为了保证统一性,应该定义一个Image类,把size和color作为特性,把scale()和transform()定义为方法。 这样一来,关于一张图片的所有数据和可执行的操作都存储在了统一的位置。
用最简单的方式解决问题。使用字典、列表和元组往往比使用模块更加简单、简洁且快速。而使用类则更为复杂。
Python创始人Guido的建议:
- 不要过度构建数据结构。尽量使用元组(以及命名元组)而不是对象。
- 尽量使用简单的属性域而不是getter/setter函数…,内置数据类型是你最好的朋友。
- 尽可能多地使用数字、字符串、元组、列表、集合以及字典。
- 多看看容器库提供的类型,尤其是双端队列(from collections import deque)。
魔法方法magic method¶
- 在Python中,所以以双下划线(__)开头和结束的方法都是魔法方法,比如构造方法__init__。
- 在类中巧妙地使用魔法方法可以构造出非常优美的代码。
- 每个魔法方法都是在对内建方法的重写,类似于装饰器的行为。
- __init__是构造方法,不能返回None外的任何返回值。
- __new__创建类,并返回类的实例,不常用。
- __str__实现类到字符串的转化,相当于str()方法,可读性更强,让人更好理解。
- __repr__实现类到字符串的转化,相当于repr()方法,便于调试,让机器更容易理解。
- __del__析构方法,在对象的生命周期结束时调用。
- __len__定义当len(class_instance)被调用时的行为。
- __eq__(self, other) 定义等于号的行为,self = other。
- __ne__(self, other) 定义不等号的行为,self != other。
- __lt__(self, other) 定义小于号的行为,self < other。
- __le__(self, other) 定义小于等于号的行为,self <= other。
- __gt__(self, other) 定义大于号的行为,self > other。
- __ge__(self, other) 定义大于等于号的行为,self >= other。
- __add__(self, other) 定义加法的行为,self + other。
- __sub__(self, other) 定义减法的行为,self - other。
- __mul__(self, other) 定义乘法的行为,self * other。
- __truediv__(self, other) 定义真除法的行为,self / other。
- __floordiv__(self, other) 定义整数除法的行为,self // other。
- __mod__(self, other) 定义取模算法的行为,self % other。
- __pow__(self, other) 定义幂指数pow()或**运算时的行为,self ** other。
- __add__(self, other) 定义加法的行为,self + other。
- __add__(self, other) 定义加法的行为,self + other。
- __call__(self, *args, **kwargs) 实现__call__后,可以将类实例当做函数一样的去使用,称为仿函数或函数对象,实例对象()就是调用__call__方法。
示例:
#!/usr/bin/python3
"""
@Time : 2019/3/31
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@Filename: magic_methods.py
@Software: PyCharm
@Desc : Magic method
"""
class Word:
"""class word"""
def __new__(cls, *args, **kwargs):
"""
创建类,并返回类的实例,在创建类的对象时__new__方法首先被调用,然后再调用__init__方法
在创建一个类的对象实例对象时,__new__必定会被调用,而__init__则不一定(pickle.load方式反序列化一个实例时不会调用)
__new__方法需要返回该类的一个实例
"""
print('Call __new__ method')
return object.__new__(cls)
def __init__(self, text):
"""
可以理解__new__与__init__方法共同构成了构造函数
__init__不能返回除None外的任何值
__init__不需要指定return语句,直接隐式return None即可
"""
print('Call __init__ method')
self.__text = text
def __del__(self):
"""
析构函数
在对象的生命周期结束时,__del__会被调用,可以将__del__理解为析构函数
__del__定义的是当一个对象进行垃圾回收时候的行为
x.__del__()并不是对del x的实现,但执行del x时会调用x.__del__()
"""
print('Call __del__ method, {} will be deleted.'.format(self))
def __str__(self):
"""
实现类到字符串的转化,将一个类的实例变成字符串
如果不定义__str__,则Python会去调用__repr__方法
如果__repr__方法也找不到的话,则会将返回类的名称以及对象的内在地址
如: Word: <__main__.Word object at 0x7efe3ad5fe48>
__str__的返回结果可读性更强
"""
print('Call __str__ method')
# self.__class__.__name__ 代表着类的名称
return '({}:{})'.format(self.__class__.__name__, self.__text)
def __repr__(self):
"""
实现类到字符串的转化,将一个类的实例变成字符串
推荐每一个类至少添加__repr__方法,这样可以保证类到字符串的转化时始终有一个有效的转化方式
"""
print('Call __repr__ method')
return '({}:{})'.format(self.__class__.__name__, self.__text)
def __len__(self):
"""
定义当len(class_instance)被调用时的行为
"""
print('Call __len__ method')
return len(self.__text.replace(',', '').replace(' ', ''))
def __add__(self, other):
"""
two class instance add
:param other: other class instance
"""
print('Call __add__ method')
return self.__text + ' and ' + other.__text
def __eq__(self, other):
"""
two class instance equal
:param other: other class instance
"""
print('Call __eq__ method')
return self.__text.lower() == other.__text.lower()
def __call__(self, text):
"""
override () , class instance function
replace the instance self to text
"""
print('Call __call__ method')
self.__text = text
return self.__text
def main():
"""main function"""
print('创建对象实例,将会调用__new__和__init__方法:')
word1 = Word('I love Python')
print('打印对象实例,将会调用__str__方法:')
print('Word:', word1)
print('=' * 30)
print('调用__repr__方法:')
print(repr(word1))
print('调用__len__方法:')
print(len(word1))
print('创建对象实例,将会调用__new__和__init__方法:')
word2 = Word('I love Go')
print('调用__add__方法:')
print(word1 + word2)
print('创建对象实例,将会调用__new__和__init__方法:')
word3 = Word('I LOVE PYTHON')
print('调用__eq__方法:')
print(word1 == word3)
print('创建对象实例,将会调用__new__和__init__方法:')
word4 = Word('I am the __call__ before')
print('调用__call__方法:')
word4('I am the __call__ after')
print('=' * 30)
print('调用__del__方法,类对象并没有被删除:')
word1.__del__()
print('打印对象实例,将会调用__str__方法:')
print('Word:', word1)
print('使用del删除对象时,会调用__del__方法,类对象并没有被删除:')
del word1
print('程序运行完成后,会自动删除对象,结束对象的生命周期!')
if __name__ == '__main__':
main()
运行结果:
创建对象实例,将会调用__new__和__init__方法:
Call __new__ method
Call __init__ method
打印对象实例,将会调用__str__方法:
Word: Call __str__ method
(Word:I love Python)
==============================
调用__repr__方法:
Call __repr__ method
(Word:I love Python)
调用__len__方法:
Call __len__ method
11
创建对象实例,将会调用__new__和__init__方法:
Call __new__ method
Call __init__ method
调用__add__方法:
Call __add__ method
I love Python and I love Go
创建对象实例,将会调用__new__和__init__方法:
Call __new__ method
Call __init__ method
调用__eq__方法:
Call __eq__ method
True
创建对象实例,将会调用__new__和__init__方法:
Call __new__ method
Call __init__ method
调用__call__方法:
Call __call__ method
==============================
调用__del__方法,类对象并没有被删除:
Call __str__ method
Call __del__ method, (Word:I love Python) will be deleted.
打印对象实例,将会调用__str__方法:
Word: Call __str__ method
(Word:I love Python)
使用del删除对象时,会调用__del__方法,类对象并没有被删除:
Call __str__ method
Call __del__ method, (Word:I love Python) will be deleted.
程序运行完成后,会自动删除对象,结束对象的生命周期!
Call __str__ method
Call __del__ method, (Word:I love Go) will be deleted.
Call __str__ method
Call __del__ method, (Word:I LOVE PYTHON) will be deleted.
Call __str__ method
Call __del__ method, (Word:I am the __call__ after) will be deleted.
参考文献:
模块-json模块¶
json模块基本介绍¶
- 存储数据结构到一个文件中称为
序列化(Serialize)
, 从文件中解析数据并存储到数据结构中称为反序列化(Deserialize)
。 - JSON (JavaScript Object Notation) is a lightweight data interchange format是一种轻量级的数据交换格式。
json.dumps
将Python数据类型转换成JSON字符串。
语法格式:
json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
json.dump
把Python数据类型转换成JSON字符串并存储在文件中(序列化)。
语法格式:
json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw)
json.loads
把JSON字符串转换成Python数据类型。
语法格式:
json.loads(s, *, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
- json.load` 将文件中的JSON字符串转换成Python数据类型(反序列化)。
语法格式:
json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
引用json模块及帮助信息:
In [1]: import json
In [2]: json?
Type: module
String form: <module 'json' from 'd:\\programfiles\\python362\\lib\\json\\__init__.py'>
File: d:\programfiles\python362\lib\json\__init__.py
Docstring:
JSON (JavaScript Object Notation) <http://json.org> is a subset of
JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data
interchange format.
:mod:`json` exposes an API familiar to users of the standard library
:mod:`marshal` and :mod:`pickle` modules. It is derived from a
version of the externally maintained simplejson library.
Encoding basic Python object hierarchies::
>>> import json
>>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]) # 将列表转成JSON字符串
'["foo", {"bar": ["baz", null, 1.0, 2]}]'
>>> print(json.dumps("\"foo\bar"))
"\"foo\bar"
>>> print(json.dumps('\u1234'))
"\u1234"
>>> print(json.dumps('\\'))
"\\"
>>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True))
{"a": 0, "b": 0, "c": 0}
>>> from io import StringIO
>>> io = StringIO() # 创建一个IO StringIO缓存对象
>>> json.dump(['streaming API'], io) # 将字符串列表写入到IO对象中
>>> io.getvalue() # 获取对象中的所有数据
'["streaming API"]'
Compact encoding::
>>> import json
>>> from collections import OrderedDict
>>> mydict = OrderedDict([('4', 5), ('6', 7)])
>>> json.dumps([1,2,3,mydict], separators=(',', ':')) # seperators分隔符是(item_separator, key_separator)的元组,默认(', ', ': ')
'[1,2,3,{"4":5,"6":7}]'
Pretty printing::
>>> import json
>>> print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4)) # 使用sort_keys表示对键进行排序,indent表示缩进4个空格
{
"4": 5,
"6": 7
}
Decoding JSON::
>>> import json
>>> obj = ['foo', {'bar': ['baz', None, 1.0, 2]}]
>>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]') == obj # 将JSON字符串转换成列表对象
True
>>> json.loads('"\\"foo\\bar"') == '"foo\x08ar'
True
>>> from io import StringIO
>>> io = StringIO('["streaming API"]')
>>> json.load(io)[0] == 'streaming API'
True
Specializing JSON object decoding::
>>> import json
>>> def as_complex(dct):
... if '__complex__' in dct:
... return complex(dct['real'], dct['imag']) # 创建一个复数
... return dct
...
>>> json.loads('{"__complex__": true, "real": 1, "imag": 2}',
... object_hook=as_complex) # 指定自定义解码的函数
(1+2j)
>>> from decimal import Decimal
>>> json.loads('1.1', parse_float=Decimal) == Decimal('1.1')
True
Specializing JSON object encoding::
>>> import json
>>> def encode_complex(obj):
... if isinstance(obj, complex):
... return [obj.real, obj.imag]
... raise TypeError(repr(o) + " is not JSON serializable")
...
>>> json.dumps(2 + 1j, default=encode_complex)
'[2.0, 1.0]'
>>> json.JSONEncoder(default=encode_complex).encode(2 + 1j)
'[2.0, 1.0]'
Using json.tool from the shell to validate and pretty-print::
$ echo '{"json":"obj"}' | python -m json.tool
{
"json": "obj"
}
$ echo '{ 1.2:3.4}' | python -m json.tool
Expecting property name enclosed in double quotes: line 1 column 3 (char 2)
JSON字符串转Python数据类型对应关系表:
JSON | Python |
---|---|
object | dict |
array | list |
string | str |
number (int) | int |
number (real) | float |
true | True |
false | False |
null | None |
Python数据类型转JSON字符串对应关系表:
Python | JSON |
---|---|
dict | object |
list, tuple | array |
str | string |
int, float, int- & float-derived Enums | number |
True | true |
False | false |
None | null |
json模块的操作如下:
#!/usr/bin/python3
# -*- coding: utf-8 -*-
# ----------------------------------------------------------
# @Time : At 下午8:11 九月 06, 2018
# @Author : 梅朝辉(meizhaohui)
# @Email : mzh.whut@gmail.com
# @Filename : ReadJson.py
# @Description : 处理json数据
# @Software : PyCharm
# @Python Version: python3.6.2
# ----------------------------------------------------------
import json
JSON_STRING = '{"username":"meizhaohui","password":"passwd"}'
DICT_DATA = {"username": "meizhaohui", "ID":1, "password": "passwd"}
class jsonAPI:
def json_to_dict(self, json_string=JSON_STRING):
"""
将JSON字符串转换成dict字典
:param json_string: JSON字符串
:return: dict
"""
return json.loads(json_string)
def dict_to_json(self, dict_data):
"""
将dict字典转换成JSON字符串
:param dict_data: dict字典
:return: str
"""
return json.dumps(dict_data)
def json_file_to_dict(self, filename):
"""
读取json文件到dict字典中
:param filename: json文件
:return: dict
"""
with open(filename) as file:
return json.load(file)
def write_json_to_file(self, filename, dict_data):
"""
将数据转为json字符串并写入文件
:param filename: 文件名
:param dict_data: 字典数据
:return: NoneType
"""
with open(filename, 'w') as file:
return json.dump(dict_data, file)
def write_pretty_json_to_file(self, filename, dict_data):
"""
将数据转为json字符串并写入文件
:param filename: 文件名
:param dict_data: 字典数据
:return: NoneType
"""
with open(filename, 'w') as file:
# sort_keys 是否按key排序,默认False
# indent 缩进长度,几个空格,建议用4或" "四个空格
# seperators分隔符是(item_separator, key_separator)的元组,默认(', ', ': ')
# 第一个是每行键值对后的分隔符,第二个是每行键值对之间的分隔符
return json.dump(dict_data, file, sort_keys=True, indent=4, separators=(',', ': '))
if __name__ == "__main__":
JAPI = jsonAPI()
print(JAPI.json_to_dict(JSON_STRING))
print(type(JAPI.json_to_dict(JSON_STRING)))
print(JAPI.dict_to_json(DICT_DATA))
print(type(JAPI.dict_to_json(DICT_DATA)))
FILENAME='json_file.json'
print(JAPI.json_file_to_dict(FILENAME))
print(type(JAPI.json_file_to_dict(FILENAME)))
NEW_JSON_FILE='new_json.json'
print(type(JAPI.write_json_to_file(NEW_JSON_FILE, DICT_DATA)))
PRETTY_JSON_FILE = 'pretty_json.json'
print(type(JAPI.write_pretty_json_to_file(PRETTY_JSON_FILE, DICT_DATA)))
"""
output as follow:
{'username': 'meizhaohui', 'password': 'passwd'}
<class 'dict'>
{"username": "meizhaohui", "ID": 1, "password": "passwd"}
<class 'str'>
{'user_id': 1, 'username': 'meizhaohui', 'password': 'passwd'}
<class 'dict'>
<class 'NoneType'>
<class 'NoneType'>
json_file.json content:
{
"user_id":1,
"username":"meizhaohui",
"password":"passwd"
}
new_json.json content:
{"username": "meizhaohui", "ID": 1, "password": "passwd"}
pretty_json.json
{
"ID": 1,
"password": "passwd",
"username": "meizhaohui"
}
"""
参考文献:
数据库处理¶
目录
sqlite3处理SQLite数据库¶
安装SQLite,请参照 SQLite 安装 。
安装完成后将SQLite安装路径加入到环境变量Path中。
在命令行打开sqlite3,并查看帮助信息:
$ sqlite3
SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
sqlite> .help
.archive ... Manage SQL archives
.auth ON|OFF Show authorizer callbacks
.backup ?DB? FILE Backup DB (default "main") to FILE
.bail on|off Stop after hitting an error. Default OFF
.binary on|off Turn binary output on or off. Default OFF
.cd DIRECTORY Change the working directory to DIRECTORY
.changes on|off Show number of rows changed by SQL
.check GLOB Fail if output since .testcase does not match
.clone NEWDB Clone data into NEWDB from the existing database
.databases List names and files of attached databases
.dbconfig ?op? ?val? List or change sqlite3_db_config() options
.dbinfo ?DB? Show status information about the database
.dump ?TABLE? ... Render all database content as SQL
.echo on|off Turn command echo on or off
.eqp on|off|full|... Enable or disable automatic EXPLAIN QUERY PLAN
.excel Display the output of next command in a spreadsheet
.exit ?CODE? Exit this program with return-code CODE
.expert EXPERIMENTAL. Suggest indexes for specified queries
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off Turn display of headers on or off
.help ?-all? ?PATTERN? Show help text for PATTERN
.import FILE TABLE Import data from FILE into TABLE
.imposter INDEX TABLE Create imposter table TABLE on index INDEX
.indexes ?TABLE? Show names of indexes
.limit ?LIMIT? ?VAL? Display or change the value of an SQLITE_LIMIT
.lint OPTIONS Report potential schema issues.
.load FILE ?ENTRY? Load an extension library
.log FILE|off Turn logging on or off. FILE can be stderr/stdout
.mode MODE ?TABLE? Set output mode
.nullvalue STRING Use STRING in place of NULL values
.once (-e|-x|FILE) Output for the next SQL command only to FILE
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
.output ?FILE? Send output to FILE or stdout if FILE is omitted
.print STRING... Print literal STRING
.progress N Invoke progress handler after every N opcodes
.prompt MAIN CONTINUE Replace the standard prompts
.quit Exit this program
.read FILE Read input from FILE
.restore ?DB? FILE Restore content of DB (default "main") from FILE
.save FILE Write in-memory database into FILE
.scanstats on|off Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN? Show the CREATE statements matching PATTERN
.selftest ?OPTIONS? Run tests defined in the SELFTEST table
.separator COL ?ROW? Change the column and row separators
.sha3sum ... Compute a SHA3 hash of database content
.shell CMD ARGS... Run CMD ARGS... in a system shell
.show Show the current values for various settings
.stats ?on|off? Show stats or turn stats on or off
.system CMD ARGS... Run CMD ARGS... in a system shell
.tables ?TABLE? List names of tables matching LIKE pattern TABLE
.testcase NAME Begin redirecting output to 'testcase-out.txt'
.timeout MS Try opening locked tables for MS milliseconds
.timer on|off Turn SQL timer on or off
.trace ?OPTIONS? Output each SQL statement as it is run
.vfsinfo ?AUX? Information about the top-level VFS
.vfslist List all available VFSes
.vfsname ?AUX? Print the name of the VFS stack
.width NUM1 NUM2 ... Set column widths for "column" mode
sqlite>
sqlite3.connect(database)
连接数据库
连接数据库database,如果数据库database不存在,则会创建数据库database,并返回Connection object:
In [1]: import sqlite3
In [2]: conn = sqlite3.connect('data.db')
In [3]: conn
Out[3]: <sqlite3.Connection at 0x230e4801e30>
同时也发现生成了文件data.db。
也可以在内存中创建数据库:
In [4]: conn_mem = sqlite3.connect(':memory:')
In [5]: conn_mem
Out[5]: <sqlite3.Connection at 0x230e4a84e30>
sqlite3.cursor()
创建游标对象
一旦建立了Connection连接,就可以创建一个Cursor对象:
In [6]: curs = conn.cursor()
In [7]: curs
Out[7]: <sqlite3.Cursor at 0x230e4b39340>
sqlite3.Cursor.execute(sql[, parameters])
执行SQL语句
通过调用Cursor对象的execute()方法来执行SQL命令:
# 创建数据表stocks
In [8]: curs.execute('''CREATE TABLE stocks (date text, trans text, symbol text, qty real, price real)''')
Out[8]: <sqlite3.Cursor at 0x230e4b39340>
# 插入一条数据到表stocks中
In [9]: curs.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")
Out[9]: <sqlite3.Cursor at 0x230e4b39340>
sqlite3.Connection.commit(sql[, parameters])
提交当前的事务
将创建数据表stocks和插入数据事务提交到数据库:
In [10]: conn.commit()
此时数据库中就新建了表stocks,并有一条数据,查询数据库里面的信息:
$ sqlite3 data.db
SQLite version 3.27.2 2019-02-25 16:06:06
Enter ".help" for usage hints.
sqlite> .header on
sqlite> .mode column
sqlite> .tables
stocks
sqlite> select * from stocks;
date trans symbol qty price
---------- ---------- ---------- ---------- ----------
2006-01-05 BUY RHAT 100.0 35.14
sqlite>
sqlite3.Connection.close()
关闭数据库连接,在关闭数据库连接前,请确保所有的事务都被commit()提交,close()不会自动调用commit()提交事务
关闭数据库连接,可以发现在关闭数据库连接后,再去执行execute去查询数据库信息会报 ProgrammingError
异常:
In [11]: conn.close()
In [12]: conn
Out[12]: <sqlite3.Connection at 0x230e4801e30>
In [13]: curs
Out[13]: <sqlite3.Cursor at 0x230e4b39340>
In [14]: curs.execute("SELECT * FROM stocks")
---------------------------------------------------------------------------
ProgrammingError Traceback (most recent call last)
<ipython-input-14-9a842a1f84e1> in <module>
----> 1 curs.execute("SELECT * FROM stocks")
ProgrammingError: Cannot operate on a closed database.
重新连接数据库:
In [15]: conn = sqlite3.connect('data.db')
In [16]: curs = conn.cursor()
为防止数据库注入攻击,不要使用Python字符串操作:
# Never do this -- insecure! 这种方式不安全
In [17]: symbol = 'RHAT'
# ``SELECT`` 查询语句
In [18]: curs.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol)
Out[18]: <sqlite3.Cursor at 0x230e4b392d0>
sqlite3.Cursor.fetchone()
获取查询结果集中的下一行数据,没有数据的话返回None
查询一行数据:
In [19]: print(curs.fetchone())
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
应该使用?问号作为占位符,并使用tuple元组作为第二个参数:
# Do this instead 使用元组以及?问号占位符
In [20]: t = ('RHAT',)
In [21]: curs.execute('SELECT * FROM stocks WHERE symbol=?', t)
Out[21]: <sqlite3.Cursor at 0x230e4b392d0>
In [22]: print(curs.fetchone())
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
# Larger example that inserts many records at a time
In [23]: purchases = [('2006-03-28', 'BUY', 'IBM', 1000, 45.00), ('2006-04-05', 'BUY', 'MSFT', 1000, 72.00), ('2006-04-06', 'SELL', 'IBM', 500, 53.00),]
sqlite3.Cursor.executemany(sql, seq_of_parameters)
对seq_of_parameters中的所有参数进行映射生成SQL语句,并执行SQL命令
将purchases中的数据映射到 INSERT
插入语句中:
In [24]: curs.executemany('INSERT INTO stocks VALUES (?,?,?,?,?)', purchases)
Out[24]: <sqlite3.Cursor at 0x230e4b392d0>
In [25]: curs.execute('SELECT * FROM stocks')
Out[25]: <sqlite3.Cursor at 0x230e4b392d0>
sqlite3.Cursor.fetchone()
获取查询结果集中的下一行数据,没有数据的话返回None
查询一行数据:
In [26]: print(curs.fetchone())
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
sqlite3.Cursor.fetchall()
获取查询结果集中所有(剩余)的行,返回一个列表,没有数据的话返回None
查询剩余行的数据:
In [27]: print(curs.fetchall())
[('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0), ('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0), ('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)]
- 要在执行SELECT语句后检索数据,可以将游标视为
iterator
迭代器,调用游标的fetchone()
方法以检索单个匹配行,或调用fetchall()
以获取所有匹配行的列表。
下面将游标作为一个 iterator
迭代器:
In [28]: for row in curs.execute('SELECT * FROM stocks ORDER BY price'):
...: print(row)
...:
('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14)
('2006-03-28', 'BUY', 'IBM', 1000.0, 45.0)
('2006-04-06', 'SELL', 'IBM', 500.0, 53.0)
('2006-04-05', 'BUY', 'MSFT', 1000.0, 72.0)
提交事务,将新插入的三行数据保存到数据库中:
In [29]: conn.commit()
sqlite3.Connection.total_changes
返回自打开数据库连接以来已修改,插入或删除的数据库行的总数。
查询插入的数据行数:
In [30]: conn.total_changes
Out[30]: 3
在SQLite3中查询数据:
sqlite> select * from stocks order by price;
date trans symbol qty price
---------- ---------- ---------- ---------- ----------
2006-01-05 BUY RHAT 100.0 35.14
2006-03-28 BUY IBM 1000.0 45.0
2006-04-06 SELL IBM 500.0 53.0
2006-04-05 BUY MSFT 1000.0 72.0
sqlite>
sqlite3.Cursor.executescript(sql_script)
将SQL语句写成脚本,并执行脚本,会直接COMMIT提交事务。它首先发出一个COMMIT语句,然后执行它作为参数获取的SQL脚本。
以下脚本先创建person表和book表,并向book表中插入一条数据:
In [31]: curs.executescript("""
...: create table person(
...: firstname,
...: lastname,
...: age
...: );
...:
...: create table book(
...: title,
...: author,
...: published
...: );
...:
...: insert into book(title, author, published)
...: values (
...: 'Dirk Gently''s Holistic Detective Agency',
...: 'Douglas Adams',
...: 1987
...: );
...: """)
Out[31]: <sqlite3.Cursor at 0x230e4b392d0>
在SQLite3中查询数据:
sqlite> .tables
book person stocks
sqlite> select * from book;
title author published
--------------------------------------- ------------- ----------
Dirk Gently's Holistic Detective Agency Douglas Adams 1987
sqlite>
说明执行 curs.executescript(sql_script)
脚本不需要另外手动提交事务。
- Connection objects可以用作自动提交或回滚事务的
with
上下文管理器。 如果发生异常,则回滚事务; 否则,提交事务成功
使用 with
上下文管理器,自动提交事务:
In [1]: import sqlite3
In [2]: auto_conn = sqlite3.connect(":memory:")
# 定义firstname unique唯一不能重复
In [3]: auto_conn.execute("create table person (id integer primary key, firstname varchar unique)")
Out[3]: <sqlite3.Cursor at 0x1ea33f65650>
# 第一次自动提交事务,并插入数据到数据库中
In [4]: with auto_conn:
...: auto_conn.execute("insert into person(firstname) values (?)", ("Joe",))
...:
In [5]: curs = auto_conn.cursor()
In [6]: curs.execute('select * from person')
Out[6]: <sqlite3.Cursor at 0x1ea33f65c00>
# 查询刚才的with上下文是否插入数据
In [7]: curs.fetchone()
Out[7]: (1, 'Joe')
# 再次使用上下文插入数据,会产生 ``sqlite3.IntegrityError`` 异常,使用try except捕获异常
In [8]: try:
...: with auto_conn:
...: auto_conn.execute("insert into person(firstname) values (?)", ("Joe",))
...: except sqlite3.IntegrityError:
...: print("couldn't add Joe twice")
...:
couldn't add Joe twice
# 关闭连接
In [9]: auto_conn.close()
pymysql处理mysql数据库¶
- 安装pymysql:
pip install PyMySQL==0.7.5
- 安装MariaDB,MariaDB下载链接: https://downloads.mariadb.org/, 安装请参考 MariaDB安装与使用
- 准备数据库数据表
创建数据库data和数据表users:
$ mysql -uroot -proot
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 9
Server version: 10.3.14-MariaDB mariadb.org binary distribution
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
4 rows in set (0.001 sec)
MariaDB [(none)]> create database data;
Query OK, 1 row affected (0.001 sec)
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| data |
| information_schema |
| mysql |
| performance_schema |
| test |
+--------------------+
5 rows in set (0.001 sec)
MariaDB [(none)]> use data;
Database changed
MariaDB [data]> show tables;
Empty set (0.001 sec)
MariaDB [data]> CREATE TABLE `users` (
-> `id` int(11) NOT NULL AUTO_INCREMENT,
-> `email` varchar(255) COLLATE utf8_bin NOT NULL,
-> `password` varchar(255) COLLATE utf8_bin NOT NULL,
-> PRIMARY KEY (`id`)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
-> AUTO_INCREMENT=1 ;
Query OK, 0 rows affected (0.059 sec)
MariaDB [data]> show tables;
+----------------+
| Tables_in_data |
+----------------+
| users |
+----------------+
1 row in set (0.000 sec)
MariaDB [data]> select * from users;
Empty set (0.000 sec)
pymysql.connect
连接数据库
语法:
pymysql.connections.Connection(host=None, user=None, password='', database=None, port=0, unix_socket=None, charset='', sql_mode=None, read_default_file=None, conv=None, use_unicode=None, client_flag=0, cursorclass=<class 'pymysql.cursors.Cursor'>, init_command=None, connect_timeout=10, ssl=None, read_default_group=None, compress=None, named_pipe=None, autocommit=False, db=None, passwd=None, local_infile=False, max_allowed_packet=16777216, defer_connect=False, auth_plugin_map=None, read_timeout=None, write_timeout=None, bind_address=None, binary_prefix=False, program_name=None, server_public_key=None)
Parameters:
host – Host where the database server is located 数据库服务主机
user – Username to log in as 登陆用户名
password – Password to use. 登陆密码
database – Database to use, None to not use a particular one. 数据库名称
port – MySQL port to use, default is usually OK. (default: 3306) 端口号
bind_address – When the client has multiple network interfaces, specify the interface from which to connect to the host. Argument can be a hostname or an IP address.
unix_socket – Optionally, you can use a unix socket rather than TCP/IP.
read_timeout – The timeout for reading from the connection in seconds (default: None - no timeout)
write_timeout – The timeout for writing to the connection in seconds (default: None - no timeout)
charset – Charset you want to use. 编码格式
sql_mode – Default SQL_MODE to use.
read_default_file – Specifies my.cnf file to read these parameters from under the [client] section.
conv – Conversion dictionary to use instead of the default one. This is used to provide custom marshalling and unmarshalling of types. See converters.
use_unicode – Whether or not to default to unicode strings. This option defaults to true for Py3k.
client_flag – Custom flags to send to MySQL. Find potential values in constants.CLIENT.
cursorclass – Custom cursor class to use.
init_command – Initial SQL statement to run when connection is established.
connect_timeout – Timeout before throwing an exception when connecting. (default: 10, min: 1, max: 31536000)
ssl – A dict of arguments similar to mysql_ssl_set()’s parameters.
read_default_group – Group to read from in the configuration file.
compress – Not supported
named_pipe – Not supported
autocommit – Autocommit mode. None means use server default. (default: False) 自动提交事务
local_infile – Boolean to enable the use of LOAD DATA LOCAL command. (default: False)
max_allowed_packet – Max size of packet sent to server in bytes. (default: 16MB) Only used to limit size of “LOAD LOCAL INFILE” data packet smaller than default (16KB).
defer_connect – Don’t explicitly connect on construction - wait for connect call. (default: False)
auth_plugin_map – A dict of plugin names to a class that processes that plugin. The class will take the Connection object as the argument to the constructor. The class needs an authenticate method taking an authentication packet as an argument. For the dialog plugin, a prompt(echo, prompt) method can be used (if no authenticate method) for returning a string from the user. (experimental)
server_public_key – SHA256 authentication plugin public key value. (default: None)
db – Alias for database. (for compatibility to MySQLdb) 数据库名称
passwd – Alias for password. (for compatibility to MySQLdb) 登陆密码
binary_prefix – Add _binary prefix on bytes and bytearray. (default: False)
连接MariaDB服务,使用data数据库:
In [1]: import pymysql
In [2]: connection = pymysql.connect(host='localhost',
...: user='root',
...: password='root',
...: db='data',
...: charset='utf8',
...: cursorclass=pymysql.cursors.DictCursor)
In [3]: connection
Out[3]: <pymysql.connections.Connection at 0x15759136518>
connection.cursor(cursor=None)
创建游标对象connection.commit()
提交事务connection.close()
关闭连接
创建游标,并执行SQL语句:
In [4]: try:
...: with connection.cursor() as cursor: # 创建游标
...: sql = "INSERT INTO `users` (`email`, `password`) VALUES (%s, %s)" # 构建SQL插入语句
...: cursor.execute(sql, ('webmaster@python.org', 'very-secret')) # 执行SQL语句
...:
...: connection.commit() # 提交事务
...: finally:
...: connection.close() # 关闭连接
...:
在MariaDB中查询数据:
MariaDB [data]> select * from users;
+----+----------------------+-------------+
| id | email | password |
+----+----------------------+-------------+
| 1 | webmaster@python.org | very-secret |
+----+----------------------+-------------+
1 row in set (0.000 sec)
MariaDB [data]>
pymysql.cursors.Cursor.fetchone()
查询一行数据
查询刚才插入的数据:
In [5]: with connection.cursor() as cursor:
...: sql = "SELECT id, password FROM users WHERE email= %s "
...: cursor.execute(sql, ('webmaster@python.org'))
...: print(cursor.fetchone())
...:
{'id': 1, 'password': 'very-secret'}
connection.select_db(db)
修改当前正在处理的数据库pymysql.cursors.Cursor.fetchall()
查询剩余行的所有数据
修改数据表为mysql,并查询数据库中的表:
In [6]: connection
Out[6]: <pymysql.connections.Connection at 0x157594142e8>
In [7]: connection.select_db('mysql')
In [8]: cursor = connection.cursor()
In [9]: cursor.execute('show tables')
Out[9]: 31
In [10]: cursor.fetchone()
Out[10]: ('column_stats',)
In [11]: cursor.fetchall()
Out[11]:
(('columns_priv',),
('db',),
('event',),
('func',),
('general_log',),
('gtid_slave_pos',),
('help_category',),
('help_keyword',),
('help_relation',),
('help_topic',),
('host',),
('index_stats',),
('innodb_index_stats',),
('innodb_table_stats',),
('plugin',),
('proc',),
('procs_priv',),
('proxies_priv',),
('roles_mapping',),
('servers',),
('slow_log',),
('table_stats',),
('tables_priv',),
('time_zone',),
('time_zone_leap_second',),
('time_zone_name',),
('time_zone_transition',),
('time_zone_transition_type',),
('transaction_registry',),
('user',))
在MariaDB中查询数据:
MariaDB [data]> use mysql;
Database changed
MariaDB [mysql]> show tables;
+---------------------------+
| Tables_in_mysql |
+---------------------------+
| column_stats |
| columns_priv |
| db |
| event |
| func |
| general_log |
| gtid_slave_pos |
| help_category |
| help_keyword |
| help_relation |
| help_topic |
| host |
| index_stats |
| innodb_index_stats |
| innodb_table_stats |
| plugin |
| proc |
| procs_priv |
| proxies_priv |
| roles_mapping |
| servers |
| slow_log |
| table_stats |
| tables_priv |
| time_zone |
| time_zone_leap_second |
| time_zone_name |
| time_zone_transition |
| time_zone_transition_type |
| transaction_registry |
| user |
+---------------------------+
31 rows in set (0.001 sec)
MariaDB [mysql]>
SQLAlchemy ORM对象关系映射处理数据库¶
Object Relational Mapper
对象关系映射,ORM将数据库中的表与面向对象语言中的类建立了一种对应关系。这样,我们要操作数据库,数据库中的表或者表中的一条记录就可以直接通过操作类或者类实例来完成。- 查看SQLAlchemy的版本
通过 sqlalchemy.__version__
查看SQLAlchemy的版本:
In [1]: import sqlalchemy
In [2]: sqlalchemy.__version__
Out[2]: '1.3.2'
- 使用
create_engine()
连接数据库。 echo=True
参数表明开启SQLAlchemy日志记录,启用后会生成所有SQL语句。create_engine()
的返回值是Engine的一个实例,它表示数据库的核心接口,使用不同的数据库处理模块处理的dialect最后生成的Engine实例不同。- 当第一次使用
create_engine()
连接时,引擎实际上还没有尝试连接到数据库(Lazy Connecting懒惰连接)。只有在第一次要求它对数据库执行任务时才会连接数据库。 - 第一次调用
Engine.execute()
或Engine.connect()
这样的方法时,Engine会建立与数据库的真实DBAPI连接,然后用于发出SQL。 - 通常不会直接使用
Engine
,而是通过使用ORM来间接使用Engine
。
使用 create_engine()
连接数据库。以下是连接内存数据库SQLite:
In [3]: from sqlalchemy import create_engine
In [4]: engine = create_engine('sqlite:///:memory:', echo=True)
In [5]: engine
Out[5]: Engine(sqlite:///:memory:)
引擎Engine的方法和属性:
engine.
begin() dialect drop execution_options logging_name run_callable transaction
connect dispatch echo get_execution_options name scalar update_execution_options
contextual_connect dispose engine has_table pool schema_for_object url
create driver execute logger raw_connection table_names
查看engine的一些属性:
In [6]: engine.url
Out[6]: sqlite:///:memory:
In [7]: engine.driver
Out[7]: 'pysqlite'
In [8]: engine.engine
Out[8]: Engine(sqlite:///:memory:)
In [9]: engine.logger
Out[9]: <sqlalchemy.log.InstanceLogger at 0x225a2ac98d0>
In [10]: engine.name
Out[10]: 'sqlite'
In [11]: engine.logging_name
In [12]: engine.echo
Out[12]: True
In [13]: engine.pool
Out[13]: <sqlalchemy.pool.impl.SingletonThreadPool at 0x225a2ac3eb8>
In [14]: engine.dialect
Out[14]: <sqlalchemy.dialects.sqlite.pysqlite.SQLiteDialect_pysqlite at 0x225a27b1f60>
- Engine是任何SQLAlchemy应用程序的起点。 它是实际数据库及其DBAPI的基础,通过
Pool
连接池和Dialect
方言传递给SQLAlchemy应用程序,该Dialect
方言描述了如何与特定类型的数据库/DBAPI组合进行通信。
SQLAlchemy Engine的架构如下:

- SQLAlchemy
create_engine()
函数基于数据库URL(Database Url)来生成Engine
对象,URL通常包含username用户名
,password密码
,hostname主机名
,database name数据库名称
以及用于其他配置的可选关键字参数。
数据库URL的典型形式是:
dialect+driver://username:password@host:port/database
- dialect方言是SQLAlchemy方言的标识名称,如sqlite, mysql, postgresql, oracle,或mssql。
- driver是使用全小写字母连接到数据库的DBAPI的名称。
- URL中特殊的字符需要使用URL编码。
可以使用urllig模块生成字符的URL编码:
In [1]: import urllib
In [2]: urllib.parse.quote_plus('kx%jj5/g')
Out[2]: 'kx%25jj5%2Fg'
MYSQL dialect方言示例:
# default
engine = create_engine('mysql://scott:tiger@localhost/foo')
# mysqlclient (a maintained fork of MySQL-Python)
engine = create_engine('mysql+mysqldb://scott:tiger@localhost/foo')
# PyMySQL
engine = create_engine('mysql+pymysql://scott:tiger@localhost/foo')
SQlite dialect方言示例:
# 相对路径
# sqlite://<nohostname>/<path>
# where <path> is relative:
engine = create_engine('sqlite:///foo.db')
# 绝对路径
# Unix/Mac - 4 initial slashes in total
engine = create_engine('sqlite:////absolute/path/to/foo.db')
# Windows
engine = create_engine('sqlite:///C:\\path\\to\\foo.db')
# Windows alternative using raw string
engine = create_engine(r'sqlite:///C:\path\to\foo.db')
# 在内存中创建数据库
engine = create_engine('sqlite://')
engine = create_engine('sqlite:///:memory:')
其他数据库如 PostgreSQL
、 Oracle
、 Microsoft SQL Server
等请参考 Database Urls 。
- 声明映射。使用ORM时,配置过程首先描述我们将要处理的数据库表,然后定义我们自己的类,这些类将映射到这些表。在现代SQLAlchemy中,这两个任务通常使用称为Declarative的系统一起执行,这允许我们创建包含指令的类,以描述它们将映射到的实际数据库表。
- 使用
declarative_base()
函数创建基类。
创建基类:
>>> from sqlalchemy.ext.declarative import declarative_base
>>> Base = declarative_base()
>>> Base
sqlalchemy.ext.declarative.api.Base
- 基于
Base
基类可以定义任意多的映射类。 - 定义映射类时,需要指定表的名称(table name),列名(names of columns)以及数据类型(datatypes of columns)。
- 类定义时需要定义
__tablename__
属性,表明表的名称。 - 类定义时需要至少一个
Column
列,用于定义表的主键,SQLAlchemy不会自动确认哪列是主键,并使用primary_key=True
表明该字段是主键。 __repr__()
方法是可选的(optional),用于改善打印实例输出。- 通过声明系统构建的映射类定义的有关表的信息,称为表元数据。
- 映射类是一个
Table对象
,可以通过检查__table__
属性来看到这个对象。
定义一个User类,并映射到user表中去:
>>> from sqlalchemy import Column, Integer, String
>>> class User(Base):
... __tablename__ = 'users'
...
... id = Column(Integer, primary_key=True)
... name = Column(String)
... fullname = Column(String)
... nickname = Column(String)
...
... def __repr__(self):
... return "<User(name='%s', fullname='%s', nickname='%s')>" % (
... self.name, self.fullname, self.nickname)
...
>>> User
__main__.User
>>> User.__table__
Table('users', MetaData(bind=None), Column('id', Integer(), table=<users>, primary_key=True, nullable=False), Column('name', String(), table=<users>), Column('fullname', String(), table=<users>), Column('nickname', String(), table=<users>), schema=None)
Table对象
是一个名为MetaData
的较大集合的成员。使用Declarative
声明时,可以使用声明性基类的.metadata
属性来使用此对象。- 调用
MetaData.create_all()
方法来创建数据表。
使用 MetaData.create_all()
方法来创建数据表:
>>> Base.metadata
MetaData(bind=None)
>>> Base.metadata.create_all(engine)
2019-04-16 22:20:12,488 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
2019-04-16 22:20:12,489 INFO sqlalchemy.engine.base.Engine ()
2019-04-16 22:20:12,490 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
2019-04-16 22:20:12,490 INFO sqlalchemy.engine.base.Engine ()
2019-04-16 22:20:12,491 INFO sqlalchemy.engine.base.Engine PRAGMA table_info("users")
2019-04-16 22:20:12,492 INFO sqlalchemy.engine.base.Engine ()
2019-04-16 22:20:12,493 INFO sqlalchemy.engine.base.Engine
CREATE TABLE users (
id INTEGER NOT NULL,
name VARCHAR,
fullname VARCHAR,
nickname VARCHAR,
PRIMARY KEY (id)
)
2019-04-16 22:20:12,494 INFO sqlalchemy.engine.base.Engine ()
2019-04-16 22:20:12,495 INFO sqlalchemy.engine.base.Engine COMMIT
>>>
由于在定义engine时,开启了 echo=True
功能,因此在创建表时会显示生成的日志信息。
- 实例化映射类就可以创建一个表对象。
创建User实例:
>>> ed_user = User(name='ed', fullname='Ed Jones', nickname='edsnickname')
>>> ed_user
<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>
>>> ed_user.name
'ed'
>>> ed_user.fullname
'Ed Jones'
>>> ed_user.nickname
'edsnickname'
>>> str(ed_user.id)
'None'
虽然在构建函数中未指定id的值,但是当我们访问它时,id属性仍然会产生None值。SQLAlchemy的检测通常在首次访问时为列映射属性生成此默认值。
- 创建Session会话,通过Session处理数据库。
- 使用
sessionmaker
创建Session会话。 - 如果创建了Engine对象engine,在创建Session时可以指定Engine对象。
创建Session会话:
>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> Session
sessionmaker(class_='Session', bind=Engine(sqlite:///:memory:), autoflush=True, autocommit=False, expire_on_commit=True)
>>> session
<sqlalchemy.orm.session.Session at 0x12ede8477b8>
- 万一之前没有定义Engine对象engine,可以分步定义Session会话。
分步定义Session会话:
>>> Session = sessionmaker()
>>> Session.configure(bind=engine) # once engine is available
>>> session = Session()
- 将实例数据写入到Session会话中,此时Session实例处于挂起(pending)状态,尚未发起任何SQL,并且该对象尚未由数据库中的行表示。
- 在未使用
session.commit()
方法前数据不会提交到数据库。 - 使用
session.add(instance)
方法添加一条数据。 - 使用
session.add_all(instances)
方法添加多条数据。
将一条数据写入到Session会话中:
>>> session.add(ed_user)
上面分写入1条数据。
- 使用
Query
对象查询数据。
查询数据:
>>> our_user = session.query(User).filter_by(name='ed').first()
2019-04-16 22:55:04,858 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-04-16 22:55:04,861 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-16 22:55:04,862 INFO sqlalchemy.engine.base.Engine ('ed', 'Ed Jones', 'eddie')
2019-04-16 22:55:04,863 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-16 22:55:04,864 INFO sqlalchemy.engine.base.Engine ('wendy', 'Wendy Williams', 'windy')
2019-04-16 22:55:04,866 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-16 22:55:04,867 INFO sqlalchemy.engine.base.Engine ('mary', 'Mary Contrary', 'mary')
2019-04-16 22:55:04,868 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-16 22:55:04,870 INFO sqlalchemy.engine.base.Engine ('fred', 'Fred Flintstone', 'freddy')
2019-04-16 22:55:04,872 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name = ?
LIMIT ? OFFSET ?
2019-04-16 22:55:04,872 INFO sqlalchemy.engine.base.Engine ('ed', 1, 0)
>>> our_user
<User(name='ed', fullname='Ed Jones', nickname='eddie')>
>>> ed_user is our_user
True
- 使用
session.new
获取挂起的数据。 - 使用
session.dirty
获取脏数据。
获取挂起数据或脏数据:
>>> session.dirty
IdentitySet([])
>>> session.new
IdentitySet([])
再添加多条数据:
>>> session.add_all([
... User(name='wendy', fullname='Wendy Williams', nickname='windy'),
... User(name='mary', fullname='Mary Contrary', nickname='mary'),
... User(name='fred', fullname='Fred Flintstone', nickname='freddy')])
上面写入3条数据。
再获取挂起数据或脏数据:
>>> session.dirty
IdentitySet([])
>>> session.new
IdentitySet([<User(name='wendy', fullname='Wendy Williams', nickname='windy')>, <User(name='mary', fullname='Mary Contrary', nickname='mary')>, <User(name='fred', fullname='Fred Flintstone', nickname='freddy')>])
修改Ed’s nickname:
>>> ed_user.nickname = 'eddie'
再获取挂起数据或脏数据:
>>> session.dirty
IdentitySet([<User(name='ed', fullname='Ed Jones', nickname='eddie')>])
>>> session.new
IdentitySet([<User(name='wendy', fullname='Wendy Williams', nickname='windy')>, <User(name='mary', fullname='Mary Contrary', nickname='mary')>, <User(name='fred', fullname='Fred Flintstone', nickname='freddy')>])
- 使用
session.commit()
方法将数据提交到数据库。
提交数据,并查询数据:
>>> session.commit()
2019-04-17 20:04:58,364 INFO sqlalchemy.engine.base.Engine UPDATE users SET nickname=? WHERE users.id = ?
2019-04-17 20:04:58,365 INFO sqlalchemy.engine.base.Engine ('eddie', 1)
2019-04-17 20:04:58,365 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-17 20:04:58,365 INFO sqlalchemy.engine.base.Engine ('wendy', 'Wendy Williams', 'windy')
2019-04-17 20:04:58,365 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-17 20:04:58,365 INFO sqlalchemy.engine.base.Engine ('mary', 'Mary Contrary', 'mary')
2019-04-17 20:04:58,366 INFO sqlalchemy.engine.base.Engine INSERT INTO users (name, fullname, nickname) VALUES (?, ?, ?)
2019-04-17 20:04:58,367 INFO sqlalchemy.engine.base.Engine ('fred', 'Fred Flintstone', 'freddy')
2019-04-17 20:04:58,367 INFO sqlalchemy.engine.base.Engine COMMIT
>>> ed_user.id
2019-04-16 22:58:59,226 INFO sqlalchemy.engine.base.Engine BEGIN (implicit)
2019-04-16 22:58:59,227 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.id = ?
2019-04-16 22:58:59,227 INFO sqlalchemy.engine.base.Engine (1,)
1
- 查询数据库数据信息
- 通过
Session
的query()
方法创建一个Query
对象。 Query
对象的常用方法见示例,详细可参考官网 Query API
查询users表中的name和fullname相关的数据:
>>> users = session.query(User.name, User.fullname)
>>> users
<sqlalchemy.orm.query.Query at 0x17a37ee4048>
>>> users.column_descriptions # 返回有关此Query将返回的列的元数据
[{'name': 'name',
'type': String(),
'aliased': False,
'expr': <sqlalchemy.orm.attributes.InstrumentedAttribute at 0x17a37ddb570>,
'entity': __main__.User},
{'name': 'fullname',
'type': String(),
'aliased': False,
'expr': <sqlalchemy.orm.attributes.InstrumentedAttribute at 0x17a37ddb620>,
'entity': __main__.User}]
>>> users.count() # 返回此Query将返回的行数
2019-04-18 20:55:52,252 INFO sqlalchemy.engine.base.Engine SELECT count(*) AS count_1
FROM (SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users) AS anon_1
2019-04-18 20:55:52,252 INFO sqlalchemy.engine.base.Engine ()
4
>>> users.all() # 查询所有的数据
2019-04-18 20:56:30,732 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users
2019-04-18 20:56:30,733 INFO sqlalchemy.engine.base.Engine ()
[('ed', 'Ed Jones'),
('wendy', 'Wendy Williams'),
('mary', 'Mary Contrary'),
('fred', 'Fred Flintstone')]
>>> users.first() # 返回第一个查询结果
2019-04-18 21:00:58,964 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users
LIMIT ? OFFSET ?
2019-04-18 21:00:58,967 INFO sqlalchemy.engine.base.Engine (1, 0)
('ed', 'Ed Jones')
>>> users.limit(2) # 限制查询个数
<sqlalchemy.orm.query.Query at 0x17a39d407b8>
>>> users.limit(2).all()
2019-04-18 21:03:01,424 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users
LIMIT ? OFFSET ?
2019-04-18 21:03:01,425 INFO sqlalchemy.engine.base.Engine (2, 0)
[('ed', 'Ed Jones'), ('wendy', 'Wendy Williams')]
>>> users.order_by(User.name) # 按User.name排序
<sqlalchemy.orm.query.Query at 0x17a37e10470>
>>> users.order_by(User.name).all()
2019-04-18 21:06:00,393 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users ORDER BY users.name
2019-04-18 21:06:00,394 INFO sqlalchemy.engine.base.Engine ()
[('ed', 'Ed Jones'),
('fred', 'Fred Flintstone'),
('mary', 'Mary Contrary'),
('wendy', 'Wendy Williams')]
>>> users.filter(User.name == 'mary') # 过滤数据
<sqlalchemy.orm.query.Query at 0x17a37e04898>
>>> users.filter(User.name == 'mary').first()
2019-04-18 21:24:54,028 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users
WHERE users.name = ?
LIMIT ? OFFSET ?
2019-04-18 21:24:54,029 INFO sqlalchemy.engine.base.Engine ('mary', 1, 0)
('mary', 'Mary Contrary')
>>> users.filter_by(name='mary') # 通过key关键字过滤数据
<sqlalchemy.orm.query.Query at 0x17a3a0567f0>
>>> users.filter_by(name='mary').first()
2019-04-18 21:25:55,339 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users
WHERE users.name = ?
LIMIT ? OFFSET ?
2019-04-18 21:25:55,340 INFO sqlalchemy.engine.base.Engine ('mary', 1, 0)
('mary', 'Mary Contrary')
>>> first_user = session.query(User).get(1) # 通过primary key主键返回对象实例
>>> first_user
<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>
>>> for name, fullname in session.query(User.name, User.fullname):
... print(name, fullname)
...
2019-04-18 21:40:18,566 INFO sqlalchemy.engine.base.Engine SELECT users.name AS users_name, users.fullname AS users_fullname
FROM users
2019-04-18 21:40:18,567 INFO sqlalchemy.engine.base.Engine ()
ed Ed Jones
wendy Wendy Williams
mary Mary Contrary
fred Fred Flintstone
>>> for row in session.query(User, User.name).all():
... print(row.User, row.name) # 查询到的对象可以像普通Python对象对待
...
2019-04-18 21:42:28,394 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
2019-04-18 21:42:28,395 INFO sqlalchemy.engine.base.Engine ()
<User(name='ed', fullname='Ed Jones', nickname='edsnickname')> ed
<User(name='wendy', fullname='Wendy Williams', nickname='windy')> wendy
<User(name='mary', fullname='Mary Contrary', nickname='mary')> mary
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')> fred
>>> for row in session.query(User.name.label('name_label')).all(): # 可以为查询的column列设置标签名
... print(row.name_label) # 使用标签名
...
2019-04-18 21:43:22,465 INFO sqlalchemy.engine.base.Engine SELECT users.name AS name_label
FROM users
2019-04-18 21:43:22,466 INFO sqlalchemy.engine.base.Engine ()
ed
wendy
mary
fred
>>> from sqlalchemy.orm import aliased
>>> user_alias = aliased(User, name='aliasuser') # 定义别名,即将User类设置别名为aliasuser
>>> user_alias
<AliasedClass at 0x17a37e04c88; User>
>>> for row in session.query(user_alias, user_alias.name).all():
... print(row.aliasuser)
...
2019-04-18 21:50:09,776 INFO sqlalchemy.engine.base.Engine SELECT aliasuser.id AS aliasuser_id, aliasuser.name AS aliasuser_name, aliasuser.fullname AS aliasuser_fullname, aliasuser.nickname AS aliasuser_nickname
FROM users AS aliasuser
2019-04-18 21:50:09,776 INFO sqlalchemy.engine.base.Engine ()
<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>
<User(name='wendy', fullname='Wendy Williams', nickname='windy')>
<User(name='mary', fullname='Mary Contrary', nickname='mary')>
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>
>>> for u in session.query(User).order_by(User.id)[1:3]: # 使用LIMIT和OFFSET偏移量
... print(u)
...
2019-04-18 21:52:48,402 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users ORDER BY users.id
LIMIT ? OFFSET ?
2019-04-18 21:52:48,403 INFO sqlalchemy.engine.base.Engine (2, 1)
<User(name='wendy', fullname='Wendy Williams', nickname='windy')>
<User(name='mary', fullname='Mary Contrary', nickname='mary')>
>>> for user in session.query(User).filter(User.name=='ed').filter(User.fullname=='Ed Jones'): # 多次过滤
... print(user)
...
2019-04-18 21:55:14,653 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name = ? AND users.fullname = ?
2019-04-18 21:55:14,654 INFO sqlalchemy.engine.base.Engine ('ed', 'Ed Jones')
<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>
- 常用过滤运算符
equals
== 相等not equals
!= 不相等LIKE
like (大小写敏感)像ILIKE
ilike (大小写不敏感)像IN
in_ 在其中NOT IN
~ in_ 不在其中IS NULL
== None 为空IS NOT NULL
!= None 不为空AND
多级过滤或使用and_()OR
多级过滤或使用or_()MATCH
match匹配,match()使用特定于数据库的MATCH或CONTAINS函数; 它的行为会因后端而异,并且在某些后端(例如SQLite)上不可用。
过滤运算示例:
>>> myquery = session.query(User)
>>> myquery
<sqlalchemy.orm.query.Query at 0x17a39b57908>
>>> myquery.filter(User.name == 'ed')
<sqlalchemy.orm.query.Query at 0x17a39d59dd8>
>>> myquery.filter(User.name == 'ed').all() # 相等
2019-04-18 22:05:45,169 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name = ?
2019-04-18 22:05:45,172 INFO sqlalchemy.engine.base.Engine ('ed',)
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>]
>>> myquery.filter(User.name != 'ed').all() # 不相等
2019-04-18 22:06:37,084 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name != ?
2019-04-18 22:06:37,085 INFO sqlalchemy.engine.base.Engine ('ed',)
[<User(name='wendy', fullname='Wendy Williams', nickname='windy')>,
<User(name='mary', fullname='Mary Contrary', nickname='mary')>,
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>]
>>> myquery.filter(User.name.like('%ed%')).all() # (区分大小写)像
2019-04-18 22:07:11,593 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name LIKE ?
2019-04-18 22:07:11,594 INFO sqlalchemy.engine.base.Engine ('%ed%',)
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>]
>>> myquery.filter(User.name.ilike('%ed%')).all() # (不区分大小写)像
2019-04-18 22:07:49,114 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE lower(users.name) LIKE lower(?)
2019-04-18 22:07:49,115 INFO sqlalchemy.engine.base.Engine ('%ed%',)
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>]
>>> myquery.filter(User.name.in_(['ed', 'wendy', 'jack'])).all() # 在其中
2019-04-18 22:09:00,462 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name IN (?, ?, ?)
2019-04-18 22:09:00,463 INFO sqlalchemy.engine.base.Engine ('ed', 'wendy', 'jack')
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
<User(name='wendy', fullname='Wendy Williams', nickname='windy')>]
>>> myquery.filter(~User.name.in_(['ed', 'wendy', 'jack'])).all() # 不在其中
2019-04-18 22:10:06,110 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name NOT IN (?, ?, ?)
2019-04-18 22:10:06,111 INFO sqlalchemy.engine.base.Engine ('ed', 'wendy', 'jack')
[<User(name='mary', fullname='Mary Contrary', nickname='mary')>,
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>]
>>> myquery.filter(User.name == None).all() # 是空
2019-04-18 22:11:13,807 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name IS NULL
2019-04-18 22:11:13,808 INFO sqlalchemy.engine.base.Engine ()
[]
>>> myquery.filter(User.name != None).all() # 非空
2019-04-18 22:11:19,570 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name IS NOT NULL
2019-04-18 22:11:19,571 INFO sqlalchemy.engine.base.Engine ()
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
<User(name='wendy', fullname='Wendy Williams', nickname='windy')>,
<User(name='mary', fullname='Mary Contrary', nickname='mary')>,
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>]
>>> from sqlalchemy import and_
>>> myquery.filter(and_(User.name == 'ed', User.fullname == 'Ed Jones'))
<sqlalchemy.orm.query.Query at 0x17a39d54f98>
>>> myquery.filter(and_(User.name == 'ed', User.fullname == 'Ed Jones')).all() # AND且操作
2019-04-18 22:12:24,261 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name = ? AND users.fullname = ?
2019-04-18 22:12:24,261 INFO sqlalchemy.engine.base.Engine ('ed', 'Ed Jones')
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>]
>>> myquery.filter(User.name == 'ed', User.fullname == 'Ed Jones').all()
2019-04-18 22:13:35,250 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name = ? AND users.fullname = ?
2019-04-18 22:13:35,251 INFO sqlalchemy.engine.base.Engine ('ed', 'Ed Jones')
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>]
>>> from sqlalchemy import or_
>>> myquery.filter(or_(User.name == 'ed', User.name == 'wendy'))
<sqlalchemy.orm.query.Query at 0x17a39d4ac88>
>>> myquery.filter(or_(User.name == 'ed', User.name == 'wendy')).all() # OR或操作
2019-04-18 22:14:16,643 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE users.name = ? OR users.name = ?
2019-04-18 22:14:16,645 INFO sqlalchemy.engine.base.Engine ('ed', 'wendy')
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
<User(name='wendy', fullname='Wendy Williams', nickname='windy')>]
- 使用文本SQL
- 可以使用
text()
来构建文本SQL
使用文本SQL:
>>> myquery.filter(text("id<3")).order_by(text('id')).all()
2019-04-18 22:22:06,749 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE id<3 ORDER BY id
2019-04-18 22:22:06,750 INFO sqlalchemy.engine.base.Engine ()
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>,
<User(name='wendy', fullname='Wendy Williams', nickname='windy')>]
>>> for user in myquery.filter(text("id<3")).order_by(text('id')).all():
... print(user.id, user.name)
...
2019-04-18 22:22:54,586 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE id<3 ORDER BY id
2019-04-18 22:22:54,587 INFO sqlalchemy.engine.base.Engine ()
1 ed
2 wendy
- 可以在字符串的SQL中使用冒号来指定绑定参数,需要使用
params()
方法。
使用冒号绑定参数:
>>> myquery.filter(text("id<:value and name=:name")).params(value=224, name='fred').order_by(User.id).one()
2019-04-18 22:25:20,752 INFO sqlalchemy.engine.base.Engine SELECT users.id AS users_id, users.name AS users_name, users.fullname AS users_fullname, users.nickname AS users_nickname
FROM users
WHERE id<? and name=? ORDER BY users.id
2019-04-18 22:25:20,752 INFO sqlalchemy.engine.base.Engine (224, 'fred')
<User(name='fred', fullname='Fred Flintstone', nickname='freddy')>
- 要使用完全基于字符串的语句,需要将完整语句的
text()
传递给from_statement()
函数。 - 如果没有其他说明符,字符串SQL中的列将根据名称与模型列匹配。
例如下面我们只使用星号表示加载所有列:
>>> myquery.from_statement(text("SELECT * FROM users where name=:name")).params(name='ed').all()
2019-04-18 22:30:43,455 INFO sqlalchemy.engine.base.Engine SELECT * FROM users where name=?
2019-04-18 22:30:43,455 INFO sqlalchemy.engine.base.Engine ('ed',)
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>]
- 匹配名称上的列适用于简单的情况,但在处理包含重复列名的复杂语句或使用不易与特定名称匹配的匿名ORM构造时可能会变得难以处理。
查询指定列的数据:
>>> stmt = text("SELECT name, id, fullname, nickname FROM users where name=:name")
>>> stmt = stmt.columns(User.name, User.id, User.fullname, User.nickname)
>>> myquery.from_statement(stmt).params(name='ed').all()
2019-04-18 22:34:44,974 INFO sqlalchemy.engine.base.Engine SELECT name, id, fullname, nickname FROM users where name=?
2019-04-18 22:34:44,975 INFO sqlalchemy.engine.base.Engine ('ed',)
[<User(name='ed', fullname='Ed Jones', nickname='edsnickname')>]
通过将SQLite数据保存到本地文件sqlalchemy.db中,创建数据库信息:
>>> from sqlalchemy import create_engine
>>> engine = create_engine('sqlite:///sqlalchemy.db')
>>> from sqlalchemy.ext.declarative import declarative_base
>>> Base = declarative_base()
>>> from sqlalchemy import Column, Integer, String
>>> class User(Base):
... __tablename__ = 'users'
...
... id = Column(Integer, primary_key=True)
... name = Column(String)
... fullname = Column(String)
... nickname = Column(String)
...
... def __repr__(self):
... return "<User(name='%s', fullname='%s', nickname='%s')>" % (
... self.name, self.fullname, self.nickname)
...
>>> User
__main__.User
>>> Base.metadata.create_all(engine)
>>> ed_user = User(name='ed', fullname='Ed Jones', nickname='edsnickname')
>>> from sqlalchemy.orm import sessionmaker
>>> Session = sessionmaker(bind=engine)
>>> session = Session()
>>> session.add(ed_user)
>>> session.add_all([
... User(name='wendy', fullname='Wendy Williams', nickname='windy'),
... User(name='mary', fullname='Mary Contrary', nickname='mary'),
... User(name='fred', fullname='Fred Flintstone', nickname='freddy')])
>>> session.commit()
>>> users = session.query(User.name, User.fullname)
>>> users.all()
[('ed', 'Ed Jones'),
('wendy', 'Wendy Williams'),
('mary', 'Mary Contrary'),
('fred', 'Fred Flintstone')]
- 统计数量
- 使用
Query
对象的count()
方法。 - 使用
sqlalchemy
的func
构造器的count()
方法,这种方法对子查询更方便。
统计查询数据的数量:
>>> session.query(User).filter(User.name.like('%ed')).count()
2
>>> from sqlalchemy import func
>>> session.query(func.count(User.name), User.name).group_by(User.name).all()
[(1, 'ed'), (1, 'fred'), (1, 'mary'), (1, 'wendy')]
>>> session.query(func.count('*')).select_from(User).scalar() # 使用select_from方法计数,等价于在数据库中执行"SELECT count(*) FROM table"
4
>>> session.query(func.count(User.id)).scalar() # 如果我们直接用User主键表示计数,则可以删除select_from()的用法
4
- 建立相对关系(Relationship)。
- 建立双向关系:在
relationship()
指令中,参数relationship.back_populates
被指定为引用补充属性名称,通过这样做,每个relationship()
可以建立两个类之间的双向关系。 - 使用双向关系时,在一个方向上添加的元素会自动在另一个方向上可见。
考虑添加第二张表address,用于存储用户的邮件地址,定义一个Address类,建立一个 one to many
一对多的关系模型:
>>> from sqlalchemy import ForeignKey
>>> from sqlalchemy.orm import relationship
>>> class Address(Base):
... __tablename__ = 'addresses'
... id = Column(Integer, primary_key=True) # 设置id为主键
... email_address = Column(String, nullable=False) # 设置email地址为String类型,非空
... user_id = Column(Integer, ForeignKey('users.id')) # 设置user_id,外键是users表中的id
...
... user = relationship("User", back_populates="addresses") # 建立相对关系,告诉ORM使用Address.user属性将Address类本身链接到User类,使用Address.user则可以访问到地址对应的User类
...
... def __repr__(self):
... return "<Address(email_address='%s')>" % self.email_address
...
>>> User.addresses = relationship("Address", order_by=Address.id, back_populates="user") # 将User.addresses映射到Address类的id属性上,通过User.addresses可以获取到用户所有的邮件地址的id列表
>>> Address
__main__.Address
>>> User
__main__.User
>>> Base.metadata.create_all(engine)
创建表了后,在SQLite3中查看已经新建了addresses表:
sqlite>
sqlite> .table
addresses users
sqlite>
使用相关对象,创建一个新的User实例,并添加邮件地址:
>>> jack = User(name='jack', fullname='Jack Bean', nickname='gjffdd')
>>> jack.addresses
[]
>>> jack.addresses = [Address(email_address='jack@google.com'), Address(email_address='j25@yahoo.com')]
>>> jack.addresses[0]
<Address(email_address='jack@google.com')>
>>> jack.addresses[1]
<Address(email_address='j25@yahoo.com')>
>>> jack.addresses[0].user
<User(name='jack', fullname='Jack Bean', nickname='gjffdd')>
>>> jack.addresses[1].user
<User(name='jack', fullname='Jack Bean', nickname='gjffdd')>
- 添加数据到数据库时,会使用
cascading
级联会话同时添加对象到数据库。
将用户jack添加到数据库中,由于级联操作,会自动将Address地址相关数据添加到数据库:
>>> session.add(jack)
>>> session.commit()
在SQLite3中查看users表和addresses表信息:
sqlite> select * from addresses;
1|jack@google.com|5
2|j25@yahoo.com|5
sqlite> select * from users;
1|ed|Ed Jones|edsnickname
2|wendy|Wendy Williams|windy
3|mary|Mary Contrary|mary
4|fred|Fred Flintstone|freddy
5|jack|Jack Bean|gjffdd
sqlite>
- 使用
join
进行联合查询。 - 使用
Query.join()
方法最容易实现实际的SQL JOIN语法。
使用 Query.filter()
在User和Address之间构造一个简单的隐式连接,并使用 Query.join()
方法实现连接:
>>> for u, a in session.query(User, Address).\
... filter(User.id==Address.user_id).\
... filter(Address.email_address=='jack@google.com').\
... all():
... print(u)
... print(a)
...
<User(name='jack', fullname='Jack Bean', nickname='gjffdd')>
<Address(email_address='jack@google.com')>
>>> session.query(User).join(Address).\
... filter(Address.email_address=='jack@google.com').\
... all()
[<User(name='jack', fullname='Jack Bean', nickname='gjffdd')>]
Query.join()
知道如何在User和Address之间进行连接,因为它们之间只有一个外键。
如果没有外键或有多个外键时,使用以下方式来进行连接:
query.join(Address, User.id==Address.user_id) # explicit condition [ 明确的条件]
query.join(User.addresses) # specify relationship from left to right [ 从左到右指定关系]
query.join(Address, User.addresses) # same, with explicit target [ 同样,有明确的目标]
query.join('addresses') # same, using a string [ 同样,使用字符串]
- 使用
aliased
对表名进行重命名。这样可以对表名使用一次或多次。
对Address表进行重命名:
>>> for username, email1, email2 in \
... session.query(User.name, adalias1.email_address, adalias2.email_address). \
... join(adalias1, User.addresses).join(adalias2, User.addresses). \
... filter(adalias1.email_address=='jack@google.com'). \
... filter(adalias2.email_address=='j25@yahoo.com'):
... print(username, email1, email2)
...
jack jack@google.com j25@yahoo.com
- 使用
session.delete(instance)
删除instance实例数据。 - SQLAlchemy不会自动级联删除(SQLAlchemy doesn’t assume that deletes cascade),必须要明确指定才会
cascade
级联删除。 - 级联操作相关请参考官网说明 SQLAlchemy 1.3 Documentation:Cascades
删除用户jack:
>>> jack
<User(name='jack', fullname='Jack Bean', nickname='gjffdd')>
>>> session.delete(jack)
>>> session.query(User).filter_by(name='jack').count()
0
>>> session.query(Address).filter(Address.email_address.in_(['jack@google.com', 'j25@yahoo.com'])).count()
2
在SQLite3中查看users表和addresses表信息:
sqlite> select * from addresses;
1|jack@google.com|5
2|j25@yahoo.com|5
sqlite> select * from users;
1|ed|Ed Jones|edsnickname
2|wendy|Wendy Williams|windy
3|mary|Mary Contrary|mary
4|fred|Fred Flintstone|freddy
5|jack|Jack Bean|gjffdd
sqlite>
说明此时jack并没有被删除掉。
使用 session.commit()
提交事务:
>>> session.commit()
再在SQLite3中查看users表和addresses表信息:
sqlite> select * from addresses;
1|jack@google.com|5
2|j25@yahoo.com|5
sqlite> select * from users;
1|ed|Ed Jones|edsnickname
2|wendy|Wendy Williams|windy
3|mary|Mary Contrary|mary
4|fred|Fred Flintstone|freddy
sqlite>
说明jack用户已经从数据库中删除掉,但其email邮箱信息并不会自动删除。
懒人包dataset处理数据库¶
在Python中,数据库并不是存储大量结构化数据的最简单的解决方案。dataset提供了一个简单的抽象层(可以删除大多数直接的SQL语句而无需完整的ORM模型),本质上,数据库可以像JSON文件或NoSQL存储一样使用。
- dataset的安装
使用pip安装:
$ pip install dataset
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting dataset
Downloading http://mirrors.aliyun.com/pypi/packages/d5/02/a4c77a15d004f1307a579e577974fa9292a63e93abff3e40ad993cf597c7/dataset-1.1.2-py2.py3-none-any.whl
Collecting alembic>=0.6.2 (from dataset)
Downloading http://mirrors.aliyun.com/pypi/packages/fc/42/8729e2491fa9b8eae160d1cbb429f61712bfc2d779816488c25cfdabf7b8/alembic-1.0.9.tar.gz (1.0MB)
100% |████████████████████████████████| 1.0MB 3.9MB/s
Requirement already satisfied: six>=1.11.0 in d:\programfiles\python362\lib\site-packages (from dataset) (1.12.0)
Requirement already satisfied: sqlalchemy>=1.1.2 in d:\programfiles\python362\lib\site-packages (from dataset) (1.3.2)
Collecting Mako (from alembic>=0.6.2->dataset)
Downloading http://mirrors.aliyun.com/pypi/packages/a1/bb/f4e5c056e883915c37bb5fb6fab7f00a923c395674f83bfb45c9ecf836b6/Mako-1.0.9.tar.gz (459kB)
100% |████████████████████████████████| 460kB 10.3MB/s
Collecting python-editor>=0.3 (from alembic>=0.6.2->dataset)
Downloading http://mirrors.aliyun.com/pypi/packages/c6/d3/201fc3abe391bbae6606e6f1d598c15d367033332bd54352b12f35513717/python_editor-1.0.4-py3-none-any.whl
Requirement already satisfied: python-dateutil in d:\programfiles\python362\lib\site-packages (from alembic>=0.6.2->dataset) (2.8.0)
Requirement already satisfied: MarkupSafe>=0.9.2 in d:\programfiles\python362\lib\site-packages (from Mako->alembic>=0.6.2->dataset) (1.1.1)
Installing collected packages: Mako, python-editor, alembic, dataset
Running setup.py install for Mako ... done
Running setup.py install for alembic ... done
Successfully installed Mako-1.0.9 alembic-1.0.9 dataset-1.1.2 python-editor-1.0.4
- 使用dataset。
导入dataset包:
>>> import dataset
- 使用
dataset.connect
创建数据库连接。 dataset
__init__文件中只有一个方法connect
。
__init__文件内容:
import os
import warnings
from dataset.database import Database
from dataset.table import Table
from dataset.util import row_type
# shut up useless SA warning:
warnings.filterwarnings(
'ignore', 'Unicode type received non-unicode bind param value.')
warnings.filterwarnings(
'ignore', 'Skipping unsupported ALTER for creation of implicit constraint')
__all__ = ['Database', 'Table', 'freeze', 'connect']
__version__ = '1.1.2'
def connect(url=None, schema=None, reflect_metadata=True, engine_kwargs=None,
reflect_views=True, ensure_schema=True, row_type=row_type):
""" Opens a new connection to a database.
*url* can be any valid `SQLAlchemy engine URL`_. If *url* is not defined
it will try to use *DATABASE_URL* from environment variable. Returns an
instance of :py:class:`Database <dataset.Database>`. Set *reflect_metadata*
to False if you don't want the entire database schema to be pre-loaded.
This significantly speeds up connecting to large databases with lots of
tables. *reflect_views* can be set to False if you don't want views to be
loaded. Additionally, *engine_kwargs* will be directly passed to
SQLAlchemy, e.g. set *engine_kwargs={'pool_recycle': 3600}* will avoid `DB
connection timeout`_. Set *row_type* to an alternate dict-like class to
change the type of container rows are stored in.::
db = dataset.connect('sqlite:///factbook.db')
.. _SQLAlchemy Engine URL: http://docs.sqlalchemy.org/en/latest/core/engines.html#sqlalchemy.create_engine
.. _DB connection timeout: http://docs.sqlalchemy.org/en/latest/core/pooling.html#setting-pool-recycle
"""
if url is None:
url = os.environ.get('DATABASE_URL', 'sqlite://')
return Database(url, schema=schema, reflect_metadata=reflect_metadata,
engine_kwargs=engine_kwargs, reflect_views=reflect_views,
ensure_schema=ensure_schema, row_type=row_type)
- dataset
connect
url需要按SQLAlchemy engine URL方式定义database_url。 - 可以定义一个环境变量
DATABASE_URL
来设置url。
数据库URL的典型形式是:
dialect+driver://username:password@host:port/database
- dialect方言是SQLAlchemy方言的标识名称,如sqlite, mysql, postgresql, oracle,或mssql。
- driver是使用全小写字母连接到数据库的DBAPI的名称。
- URL中特殊的字符需要使用URL编码。
- 使用
dataset.connect(url)
来连接数据库引擎。
我们使用SQLite3将数据库保存到dataset.db文件中:
>>> db = dataset.connect('sqlite:///dataset.db')
>>> db
<Database(sqlite:///dataset.db)>
- 使用
get_table(table_name, primary_id=None, primary_type=None)
或create_table(table_name, primary_id=None, primary_type=None)
加载表或创建表,如果表不存在则会创建表。 - 使用
db[table_name]
也可以加载或创建表。
指定数据库中的表时,可以使用类似于字典的语法,当表不存在时,会默认建表:
>>> table = db.get_table('user')
>>> table
<Table(user)>
>>> table1 = db['user']
>>> table1
<Table(user)>
>>> id(table) == id(table1)
True
>>> db['population']
<Table(population)>
>>> table2 = db['population']
>>> table2
<Table(population)>
在SQLite3中查看user表和population表信息:
sqlite> .table
population user
sqlite> .schema user
CREATE TABLE user (
id INTEGER NOT NULL,
PRIMARY KEY (id)
);
sqlite> .schema population
CREATE TABLE population (
id INTEGER NOT NULL,
PRIMARY KEY (id)
);
sqlite>
创建表时指主键和主键类型:
>>> table_population2 = db.create_table('population2', 'age') # 指定age为主键
>>> table_population2
<Table(population2)>
>>> table_population3 = db.create_table('population3', primary_id='city', primary_type=db.types.text) # 指定city为主键,主键类型为text类型
>>> table_population3
<Table(population3)>
>>> table_population4 = db.create_table('population4', primary_id='city', primary_type=db.types.string(25)) # 指定city为主键,主键类型为string类型(对应varchar(25))
>>> table_population4
<Table(population4)>
再在SQLite3中查看表信息:
sqlite> .table
population population2 population3 population4 user
sqlite> .schema population2
CREATE TABLE population2 (
age INTEGER NOT NULL,
PRIMARY KEY (age)
);
sqlite> .schema population3
CREATE TABLE population3 (
city TEXT NOT NULL,
PRIMARY KEY (city)
);
sqlite> .schema population4
CREATE TABLE population4 (
city VARCHAR(25) NOT NULL,
PRIMARY KEY (city)
);
sqlite>
- 对
Table
对象使用insert(row, ensure=None, types=None)
插入数据,row为字典数据,返回插入行的primary key号。 - 如果row字典中的键不在表中,则会自动创建相应的column列。
插入一行数据:
>>> table.insert(dict(name='John Doe', age=46, country='China'))
1
再在SQLite3中查看user表信息,使用 .headers on
打开表头header,并使用 .mode column
打开column列模式:
sqlite> .headers on
sqlite> .mode column
sqlite> select * from user;
id name age country
---------- ---------- ---------- ----------
1 John Doe 46 China
sqlite> .schema user
CREATE TABLE user (
id INTEGER NOT NULL, name TEXT, age INTEGER, country TEXT,
PRIMARY KEY (id)
);
sqlite>
可以发现列 name
和 country
被自动加入到表中。
再插入一行数据:
>>> table.insert(dict(name='Edmond Dantes', age=37, country='France', gender='male'))
2
再在SQLite3中查看user表信息:
sqlite> .schema user
CREATE TABLE user (
id INTEGER NOT NULL, name TEXT, age INTEGER, country TEXT, gender TEXT,
PRIMARY KEY (id)
);
sqlite> select * from user; --在默认的情况下,每列至少10个字符宽。太宽的数据将被截取。你可以用“.width”命令来调整列宽。
id name age country gender
---------- ---------- ---------- ---------- ----------
1 John Doe 46 China
2 Edmond Dan 37 France male
sqlite> .width 12 20 -- 改变第一列的宽度为12字符,改变第二列的宽度为20字符
sqlite> select * from user;
id name age country gender
------------ -------------------- ---------- ---------- ----------
1 John Doe 46 China
2 Edmond Dantes 37 France male
sqlite> select * from user where name="Edmond Dantes";
id name age country gender
------------ -------------------- ---------- ---------- ----------
2 Edmond Dantes 37 France male
可以发现新列gender被自动添加进数据库。
- 对
Table
对象使用update(row, keys, ensure=None, types=None, return_count=False)
更新数据,row为字典数据,返回更新行的总行数。 - 如果row字典中的键不在表中,则会自动创建相应的column列。
更新John的年龄为47岁:
>>> table.update(dict(name='John Doe', age=47), ['name'])
1
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender
------------ -------------------- ---------- ---------- ----------
1 John Doe 47 China
2 Edmond Dantes 37 France male
sqlite>
可以发现John Doe的年龄已经从46岁变成47岁了。
发现John Doe的性别没有指定,更新一下:
>>> table.update(dict(name='John Doe', gender='famale'), ['name'])
1
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender
------------ -------------------- ---------- ---------- ----------
1 John Doe 47 China famale
2 Edmond Dantes 37 France male
sqlite>
性别补充好了,又发现可以补充一个email邮箱的字段:
>>> table.update(dict(id=1, email='john@python.org'),['id'])
1
>>> table.update(dict(id=2, email='edmond@python.org'),['id'])
1
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender email
------------ -------------------- ---------- ---------- ---------- ---------------
1 John Doe 47 China famale john@python.org
2 Edmond Dantes 37 France male edmond@python.o
sqlite>
说明在update时如果列不存在的时候也可以自动加入到数据库中。
不指定具体对哪一行进行更新:
>>> table.update(dict(age=30),['id'])
2
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender email
------------ -------------------- ---------- ---------- ---------- ---------------
1 John Doe 30 China famale john@python.org
2 Edmond Dantes 30 France male edmond@python.o
sqlite>
说明此时对所有的行进行更新,将age全部设置为30岁。
- 使用Transactions事务上下文管理器。
使用 with
上下文管理器:
>>> with db:
... db['user'].insert(dict(name='John Doe', age=46, country='China'))
...
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender email
---------- ---------- ---------- ---------- ---------- ---------------
1 John Doe 32 China famale john@python.org
2 Edmond Dan 32 France male edmond@python.o
3 John Doe 46 China
- 通过调用
begin()
、commit()
、rollback()
以及使用try..except
捕获异常。
使用 try..except
捕获异常:
>>> db = dataset.connect('sqlite:///dataset.db')
>>> db.begin()
>>> try:
... db['user'].update(dict(id=3,name='John King', gender='male', email='king@python.org'), ['id'])
... db.commit()
... except:
... db.rollback()
...
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender email
---------- ---------- ---------- ---------- ---------- ---------------
1 John Doe 32 China famale john@python.org
2 Edmond Dan 32 France male edmond@python.o
3 John King 46 China male king@python.org
sqlite>
可以看到第三行数据已经更新。
- 检索数据库和表。
db.tables
查看数据库中所有的表信息。db[table_name].columns
查看数据库表中所有字段信息。len(db[table_name])
统计表中的数据行数。
查看表信息和表字段信息:
>>> db.tables
['population', 'population2', 'population3', 'population4', 'user']
>>> db['user'].columns
['id', 'name', 'age', 'country', 'gender', 'email']
>>> db['population'].columns
['id']
>>> len(db['user'])
3
>>> len(db['population'])
0
Table.all()
获取所有数据。- 如果我们只想迭代表中的所有行,我们可以省略
all()
。
获取表中的所有数据:
>>> table
<Table(user)>
>>> table.all()
<dataset.util.ResultIter at 0x251a25e9d30>
>>> users = table.all()
>>> users
<dataset.util.ResultIter at 0x251a2643c88>
>>> for user in users:
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
>>> for user in table:
... print(user['name'], user['age'], user['country'])
...
John Doe 32 China
Edmond Dantes 32 France
John King 46 China
Table.find()
查找所有特定条件的数据。Table.find_one()
查找所有特定条件的数据,但仅返回一条数据。- 使用
_limit
关键字参数可以限定返回的数据个数。 - 使用
order_by
关键字参数可以对查找的结果进行排序。
通过 find
或 find_one
获取数据:
>>> chinese_users = table.find(country='China')
>>> chinese_users
<dataset.util.ResultIter at 0x251a2bd97b8>
>>> for user in chinese_users:
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
>>> table.find_one(country='China')
OrderedDict([('id', 1),
('name', 'John Doe'),
('age', 32),
('country', 'China'),
('gender', 'famale'),
('email', 'john@python.org')])
>>> for user in table.find(country='China', _limit=1): # 限定输出1条结果
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
>>> for user in table.find(country='China', _limit=2): # 限定输出2条结果
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
>>> for user in table.find(country='China', order_by='age'): # 按age年龄进行升序排列
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
>>> for user in table.find(country='China', order_by='-age'): # 按age年龄进行降序排列
... print(user)
...
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
>>> for user in table.find(country='France', age=32):
... print(user)
...
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
>>> table.find(id=[1, 3])
<dataset.util.ResultIter at 0x251a2bf82b0>
>>> for user in table.find(id=[1, 3]):
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
- 在
find
或find_one
中使用比较运算符(comparison operators)。
可使用的运算符包括:
gt, >
lt, <
gte, >=
lte, <=
!=, <>, not
between, ..
使用比较运算符:
>>> for user in table.find(age={'>=': 40}):
... print(user)
...
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
>>> for user in table.find(age={'gt': 40}):
... print(user)
...
OrderedDict([('id', 3), ('name', 'John King'), ('age', 46), ('country', 'China'), ('gender', 'male'), ('email', 'king@python.org')])
>>> for user in table.find(age={'lt': 40}):
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
>>> for user in table.find(age={'<': 40}):
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
>>> for user in table.find(age={'between':[30,40]}):
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
>>> for user in table.find(age={'..':[30,40]}):
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
Table.distinct()
获取一列或多列的唯一行。
如获取所有的国家信息:
>>> table.distinct('country')
<dataset.util.ResultIter at 0x251a2df57f0>
>>> for country in table.distinct('country'):
... print(country)
...
OrderedDict([('country', 'China')])
OrderedDict([('country', 'France')])
>>> for age in table.distinct('age'):
... print(age)
...
OrderedDict([('age', 32)])
OrderedDict([('age', 46)])
>>> for age_country in table.distinct('age','country'):
... print(age_country)
...
OrderedDict([('age', 32), ('country', 'China')])
OrderedDict([('age', 32), ('country', 'France')])
OrderedDict([('age', 46), ('country', 'China')])
>>> for age in table.distinct('age',country='China'):
... print(age)
...
OrderedDict([('age', 32)])
OrderedDict([('age', 46)])
- 使用
db.query(SQL_STRING)
运行自定义SQL字符串SQL_STRING。
统计每个国家的用户数量:
>>> result = db.query('SELECT country, COUNT(*) c FROM user GROUP BY country')
... for row in result:
... print(row['country'], row['c'])
...
China 2
France 1
>>> result = db.query('SELECT country, COUNT(*) AS count FROM user GROUP BY country')
... for row in result:
... print(row['country'], row['count'])
...
China 2
France 1
Table.delete(*clauses, **filters)
从表中删除行数据。- If no arguments are given, all records are deleted. 即
如果没指定参数,所有的行数据都会会删除
!!!
在表中删除行数据:
>>> user_king = table.find_one(name='John King')
>>> user_king
OrderedDict([('id', 3),
('name', 'John King'),
('age', 46),
('country', 'China'),
('gender', 'male'),
('email', 'king@python.org')])
>>> table.delete(name='John King')
True
>>> for user in table.all():
... print(user)
...
OrderedDict([('id', 1), ('name', 'John Doe'), ('age', 32), ('country', 'China'), ('gender', 'famale'), ('email', 'john@python.org')])
OrderedDict([('id', 2), ('name', 'Edmond Dantes'), ('age', 32), ('country', 'France'), ('gender', 'male'), ('email', 'edmond@python.org')])
再在SQLite3中查看user表信息:
sqlite> select * from user;
id name age country gender email
---------- ---------- ---------- ---------- ---------- ---------------
1 John Doe 32 China famale john@python.org
2 Edmond Dan 32 France male edmond@python.o
sqlite>
不设置参数,使用delete删除:
>>> table.delete()
True
>>> for user in table.all():
... print(user)
...
再在SQLite3中查看user表信息:
sqlite> select * from user;
sqlite>
已经查询不到数据,说明user表已经被清空了。
Table.drop_column(name)
从表中删除指定列。- SQLite不支持删除列。
尝试删除列:
>>> table.columns
['id', 'name', 'age', 'country', 'gender', 'email']
>>> table.drop_column('email')
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-79-1932daeb597f> in <module>
----> 1 table.drop_column('email')
RuntimeError: SQLite does not support dropping columns.
提示 RuntimeError
异常。
memcached的使用¶
- Memcached是一个自由开源的,高性能,分布式内存对象缓存系统。
- Memcached是一种基于内存的key-value存储,用来存储小块的任意数据(字符串、对象)。这些数据可以是数据库调用、API调用或者是页面渲染的结果。
linux下安装Memcached 参见https://www.runoob.com/memcached/window-install-memcached.html 。
安装依赖包:
[root@localhost ~]# yum install libevent libevent-devel -y
安装Memcached:
[root@localhost ~]# yum install memcached -y
查看memcached的帮助信息:
[root@localhost ~]# memcached -h
memcached 1.4.15
-p <num> TCP port number to listen on (default: 11211)
-U <num> UDP port number to listen on (default: 11211, 0 is off)
-s <file> UNIX socket path to listen on (disables network support)
-a <mask> access mask for UNIX socket, in octal (default: 0700)
-l <addr> interface to listen on (default: INADDR_ANY, all addresses)
<addr> may be specified as host:port. If you don't specify
a port number, the value you specified with -p or -U is
used. You may specify multiple addresses separated by comma
or by using -l multiple times
-d run as a daemon
-r maximize core file limit
-u <username> assume identity of <username> (only when run as root)
-m <num> max memory to use for items in megabytes (default: 64 MB)
-M return error on memory exhausted (rather than removing items)
-c <num> max simultaneous connections (default: 1024)
-k lock down all paged memory. Note that there is a
limit on how much memory you may lock. Trying to
allocate more than that would fail, so be sure you
set the limit correctly for the user you started
the daemon with (not for -u <username> user;
under sh this is done with 'ulimit -S -l NUM_KB').
-v verbose (print errors/warnings while in event loop)
-vv very verbose (also print client commands/reponses)
-vvv extremely verbose (also print internal state transitions)
-h print this help and exit
-i print memcached and libevent license
-P <file> save PID in <file>, only used with -d option
-f <factor> chunk size growth factor (default: 1.25)
-n <bytes> minimum space allocated for key+value+flags (default: 48)
-L Try to use large memory pages (if available). Increasing
the memory page size could reduce the number of TLB misses
and improve the performance. In order to get large pages
from the OS, memcached will allocate the total item-cache
in one large chunk.
-D <char> Use <char> as the delimiter between key prefixes and IDs.
This is used for per-prefix stats reporting. The default is
":" (colon). If this option is specified, stats collection
is turned on automatically; if not, then it may be turned on
by sending the "stats detail on" command to the server.
-t <num> number of threads to use (default: 4)
-R Maximum number of requests per event, limits the number of
requests process for a given connection to prevent
starvation (default: 20)
-C Disable use of CAS
-b <num> Set the backlog queue limit (default: 1024)
-B Binding protocol - one of ascii, binary, or auto (default)
-I Override the size of each slab page. Adjusts max item size
(default: 1mb, min: 1k, max: 128m)
-S Turn on Sasl authentication
-o Comma separated list of extended or experimental options
- (EXPERIMENTAL) maxconns_fast: immediately close new
connections if over maxconns limit
- hashpower: An integer multiplier for how large the hash
table should be. Can be grown at runtime if not big enough.
Set this based on "STAT hash_power_level" before a
restart.
[root@localhost ~]#
启动memecached:
[root@localhost ~]# memcached -u root -p 11211 -m 64m -d
启动选项说明:
-u root 以root用户运行Memcache(如果不使用此选项,则会提示can't run as root without the -u switch)
-p 11211 是设置Memcache监听的端口为11211
-m 64m 是分配给Memcache使用的内存数量,单位是64MB
-d 是启动一个守护进程
安装telnet工具:
[root@localhost ~]# yum install telnet-server telnet -y
使用 telnet HOST PORT
连接memcached服务,HOST、PORT是运行memcached的主机和端口。
连接memcached服务:
[root@localhost ~]# telnet 127.0.0.1 11211
Trying 127.0.0.1...
Connected to 127.0.0.1.
Escape character is '^]'.
说明已经连接上memcached服务。
或者HOST使用localhost也可以:
[root@localhost ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
在连接上memcached服务后,就可以执行memcached命令了。
memcached的存储命令 set
add
replace
append
prepend
cas
¶
set
命令,用于将value值存储到key键中,如果key已经存在,则会更新key的value值。
语法如下:
set key flags exptime bytes [noreply]
value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
flags: 可以包括键值对的整型参数,客户机使用它存储关于键值对的额外信息。
exptime: 在缓存中保存键值对的时间长度(以秒为单位,0 表示永远)。
bytes: 在缓存中存储的字节数。
noreply: 可选参数,该参数告诉服务器不需要返回数据。
value: 键值 key-value 结构中的 value,存储的值,始终位于第二行。
设置一个键值对:
set firstkey 0 900 15
hello,memcached
STORED
get firstkey
VALUE firstkey 0 15
hello,memcached
END
set firstkey 0 900 16 <-- 说明:此处是对firstkey键的value值进行更新
hello,memcached!
STORED
get firstkey
VALUE firstkey 0 16
hello,memcached!
END
其中:
key键为firstkey
flags为0
exptime过期时间900s
bytes存储字节数15
value存储的值为hello,memcached
设置过期时间:
set secondkey 0 30 6 <-- 说明:设置过期时间为30秒
hello!
STORED
get secondkey <-- 说明:在30s内能够获取到secondkey的值
VALUE secondkey 0 6
hello!
END
get secondkey <-- 说明:在30s内能够获取到secondkey的值
VALUE secondkey 0 6
hello!
END
get secondkey <-- 说明:在30s内能够获取到secondkey的值
VALUE secondkey 0 6
hello!
END
get secondkey <-- 说明:在30s内能够获取到secondkey的值
VALUE secondkey 0 6
hello!
END
get secondkey <-- 说明:超过30s后,获取不到secondkey的值,说明secondkey已经过期
END
设置无返回数据:
set noreplykey 0 900 6 noreply
123456 <-- 说明:设置成功后,并没有返回STORED
get noreplykey
VALUE noreplykey 0 6
123456
END
- 存储正确时,输出信息为
STORED
,表示已经存储成功。 - 存储失败时,输出信息为
ERROR
,表示存储失败。
键值设置错误时的输出:
set test 0 900 6
1234567890 <-- 说明:此处输入的值是10byte,而缓存存储只指定存储字节数是6byte,超过允许的范围
CLIENT_ERROR bad data chunk
ERROR
set test 0 900 6
1234
56 <-- 说明:此处指定存储字节数是6byte,但存储值分两行写,也导致存储错误
CLIENT_ERROR bad data chunk
ERROR
add
命令,将value存储在指定的key键中。- 如果key存在,且未过期,则不会更新数据,并返回响应
NOT_STORED
。 - 如果key存在,且已经过期,则会更新数据。
- 如果key不存在,则会添加数据,作用同
set
。
语法如下:
add key flags exptime bytes [noreply]
value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
flags: 可以包括键值对的整型参数,客户机使用它存储关于键值对的额外信息。
exptime: 在缓存中保存键值对的时间长度(以秒为单位,0 表示永远)。
bytes: 在缓存中存储的字节数。
noreply: 可选参数,该参数告诉服务器不需要返回数据。
value: 键值 key-value 结构中的 value,存储的值,始终位于第二行。
设置一个键值对:
get firstkey <-- 说明:能够获取到firstkey的值
VALUE firstkey 0 16
hello,memcached!
END
get seondkey <-- 说明:不能够获取到secondkey的值,因为secondkey键已经过期
END
add firstkey 0 900 5 <-- 说明:尝试对firstkey键进行更新,未能成功
hello
NOT_STORED <-- 说明:返回NOT_STORED
add secondkey 0 900 5 <-- 说明:尝试对secondkey键进行更新,成功
hello
STORED <-- 说明:返回STORED
add thirdkey 0 900 5 <-- 说明:尝试对thirdkey键进行更新,成功,因为thirdkey键不存在,所以相当于设置键值对
hello
STORED <-- 说明:返回STORED
get firstkey <-- 说明:获取到firstkey的值,但值未更新
VALUE firstkey 0 16
hello,memcached!
END
get secondkey <-- 说明:获取到secondkey的值,但值已经更新
VALUE secondkey 0 5
hello
END
get thirdkey <-- 说明:获取到thirdkey的值,但值已经更新
VALUE thirdkey 0 5
hello
END
replace
命令,将已经存在的key键的的值为value。- 如果key存在,且未过期,则替换数据,并返回响应
STORED
。 - 如果key存在,且已经过期,则替换失败, 并返回响应
NOT_STORED
。 - 如果key不存在,则替换失败,并返回响应
NOT_STORED
。
语法如下:
replace key flags exptime bytes [noreply]
value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
flags: 可以包括键值对的整型参数,客户机使用它存储关于键值对的额外信息。
exptime: 在缓存中保存键值对的时间长度(以秒为单位,0 表示永远)。
bytes: 在缓存中存储的字节数。
noreply: 可选参数,该参数告诉服务器不需要返回数据。
value: 键值 key-value 结构中的 value,存储的值,始终位于第二行。
对键的值进行替换:
get firstkey
VALUE firstkey 0 16
hello,memcached!
END
get secondkey
VALUE secondkey 0 5
hello
END
get thirdkey
VALUE thirdkey 0 5
hello
END
set fourthkey 0 30 5
hello
STORED
get fourthkey
VALUE fourthkey 0 5
hello
END
get fourthkey
VALUE fourthkey 0 5
hello
END
get fourthkey
VALUE fourthkey 0 5
hello
END
get fourthkey <-- 说明:获取不到fourthkey的值,说明其已经过期
END
replace secondkey 0 900 6 <-- 说明:替换secondkey的值,替换成功返回"STORED"
hello!
STORED
get secondkey <-- 说明:获取secondkey的值,已经从"hello"变成了"hello!"
VALUE secondkey 0 6
hello!
END
replace thirdkey 0 900 3 <-- 说明:替换thirdkey的值,替换成功返回"STORED"
hi!
STORED
get thirdkey <-- 说明:获取thirdkey的值,已经从"hello"变成了"hi!"
VALUE thirdkey 0 3
hi!
END
replace fourthkey 0 900 16 <-- 说明:替换fourthkey的值,因为fourthkey已经过期,替换失败返回"NOT_STORED"
hello,memcached!
NOT_STORED
replace notexist 0 900 5 <-- 说明:替换notexist的值,因为notexist这个键不存在,不进行替换
hello
NOT_STORED
append
命令,在已经存在的key键的的值为value后面追加数据。- 如果key存在,且未过期,则追加数据,并返回响应
STORED
。 - 如果key存在,且已经过期,则追加失败, 并返回响应
NOT_STORED
。 - 如果key不存在,则追加失败,并返回响应
NOT_STORED
。
语法如下:
append key flags exptime bytes [noreply]
value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
flags: 可以包括键值对的整型参数,客户机使用它存储关于键值对的额外信息。
exptime: 在缓存中保存键值对的时间长度(以秒为单位,0 表示永远)。
bytes: 在缓存中存储的字节数,即追加多少字节数。
noreply: 可选参数,该参数告诉服务器不需要返回数据。
value: 键值 key-value 结构中的 value,存储的值,始终位于第二行。
在键的值的后面进行追加数据:
get firstkey
END
get secondkey
END
get thirdkey
END
get fourthkey
END
set firstkey 0 12 5 <-- 说明:设置firstkey的值,12秒后过期
first
STORED
set secondkey 0 900 6
second
STORED
get firstkey <-- 说明:获取firstkey的值,因为已经过期,获取不到数据
END
append firstkey 0 12 1 <-- 说明:向firstkey的值后面追加1个字节的数据,追加失败,返回"NOT_STORED"
!
NOT_STORED
append secondkey 0 900 4 <-- 说明:向secondkey的值后面追加4个字节的数据,追加成功,返回"STORED"
line
STORED
get secondkey <-- 说明:获取secondkey的值,已经是追加后的数据"secondline"了
VALUE secondkey 0 10
secondline
END
append thirdkey 0 900 5 <-- 说明: 向不存在的thirdkey中追加数据,追加失败,返回"NOT_STORED"
hello
NOT_STORED
get thirdkey
END
prepend
命令,在已经存在的key键的的值为value前面追加数据。- 如果key存在,且未过期,则追加数据,并返回响应
STORED
。 - 如果key存在,且已经过期,则追加失败, 并返回响应
NOT_STORED
。 - 如果key不存在,则追加失败,并返回响应
NOT_STORED
。
语法如下:
prepend key flags exptime bytes [noreply]
value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
flags: 可以包括键值对的整型参数,客户机使用它存储关于键值对的额外信息。
exptime: 在缓存中保存键值对的时间长度(以秒为单位,0 表示永远)。
bytes: 在缓存中存储的字节数,即追加多少字节数。
noreply: 可选参数,该参数告诉服务器不需要返回数据。
value: 键值 key-value 结构中的 value,存储的值,始终位于第二行。
在键的值的前面进行追加数据:
set firstkey 0 12 5
first
STORED
get firstkey
VALUE firstkey 0 5
first
END
get firstkey <-- 说明:键firstkey过期
END
set secondkey 0 900 5
hello
STORED
prepend firstkey 0 900 5 <-- 说明:向过期的键firstkey的值前面追加5个字节的数据,追加失败,返回"NOT_STORED"
befor
NOT_STORED
prepend secondkey 0 900 6 <-- 说明:向secondkey的值前面追加6个字节的数据,追加成功,返回"STORED"
before
STORED
prepend thirdkey 0 900 6 <-- 说明: 向不存在的thirdkey中追加数据,追加失败,返回"NOT_STORED"
before
NOT_STORED
- Memcached于1.2.4版本新增CAS(Check and Set)协议,处理同一item被多个线程更改过程的并发问题。
- 在Memcached中,每个key关联有一个64-bit长度的long型惟一数值,表示该key对应value的版本号。这个数值由Memcached server产生,从1开始,且同一Memcached server不会重复。在两种情况下这个版本数值会加1:1、新增一个key-value对;2、对某已有key对应的value值更新成功。删除item版本值不会减小。
cas
命令,用于将value值存储到key键中,如果key已经存在,且未被其他用户更新,则会更新key的value值,并返回”STORED”。cas
命令,用于将value值存储到key键中,如果key已经存在,且被其他用户更新,则不会更新key的value值,并返回”EXISTS”。cas
命令,用于将value值存储到key键中,如果key不存在,则不会更新key的value值,并返回”NOT_FOUND”。cas
命令,用于将value值存储到key键中,如果cas命令的语法错误,则返回”ERROR”。
语法如下:
cas key flags exptime bytes unique_cas_token [noreply]
value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
flags: 可以包括键值对的整型参数,客户机使用它存储关于键值对的额外信息。
exptime: 在缓存中保存键值对的时间长度(以秒为单位,0 表示永远)。
bytes: 在缓存中存储的字节数。
unique_cas_token:通过gets命令获取到的一个唯一的64位的CAS值。
noreply: 可选参数,该参数告诉服务器不需要返回数据。
value: 键值 key-value 结构中的 value,存储的值,始终位于第二行。
我使用以下三台服务器演示并发操作:
server 192.168.56.11
node1 192.168.56.12
node2 192.168.56.13
首先在server端防火墙放行11211端口:
[root@server ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3 enp0s8
sources:
services: ssh dhcpv6-client
ports: 8140/tcp 53/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
[root@server ~]# firewall-cmd --permanent --add-port=11211/tcp
success
[root@server ~]# firewall-cmd --reload
success
[root@server ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3 enp0s8
sources:
services: ssh dhcpv6-client
ports: 8140/tcp 53/tcp 11211/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
在server端使用telnet连接memcached服务器,并设置一个firstkey键,值为”hello,memcached”:
[root@server ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
set firstkey 0 900 15
hello,memcached
STORED
get firstkey
VALUE firstkey 0 15
hello,memcached
END
在node1节点使用telnet连接memcached服务器,并获取firstkey键值:
[root@node1 ~]# telnet 192.168.56.11 11211
Trying 192.168.56.11...
Connected to 192.168.56.11.
Escape character is '^]'.
get firstkey
VALUE firstkey 0 15
hello,memcached
END
在node2节点使用telnet连接memcached服务器,并获取firstkey键值:
[root@node2 ~]# telnet 192.168.56.11 11211
Trying 192.168.56.11...
Connected to 192.168.56.11.
Escape character is '^]'.
get firstkey
VALUE firstkey 0 15
hello,memcached
END
可以发现在node1节点和node2节点上面都能够正常的获取firstkey键的值。
如果我们要在node1节点给firstkey追加一个字符”!”,期望追加后的firstkey变成”hello,memcached!”。要在node2节点上给firstkey追加12个字符”How are you?”,期望追加后的firstkey变成”hello,memcachedHow are you?”。
先在node1节点上面追加:
append firstkey 0 900 1
!
STORED
get firstkey
VALUE firstkey 0 16
hello,memcached!
END
再在node2节点上面追加:
append firstkey 0 900 12
How are you?
STORED
get firstkey
VALUE firstkey 0 28
hello,memcached!How are you?
END
对比发现在node2上面获取的firstkey的值是”hello,memcached!How are you?”,并不是”hello,memcachedHow are you?”,原因是node2在更新firstkey前,node1已经将firstkey键进行了更新,并增加了一个字符”!”,而node2并不知道这件事,导致node2获取的数据并不是自己预期的那样!!!
可以发现,这样并不能保证多个节点上修改firstkey的初始值是同一个值,也不能保证firstkey不会多个节点修改。
- Memcached自1.2.4版本新增CAS协议用于解决并发修改问题,即给每个KEY关键一个CAS值,表示该KEY对应的value的版本号。
- 使用
gets key
可以查询key的CAS值。
在server端使用telnet连接memcached服务器,并设置一个firstkey键,值为”Hello”,设置secondkey键,值为”hi”:
[root@server ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
set firstkey 0 3600 5
Hello
STORED
gets firstkey
VALUE firstkey 0 5 22
Hello
END
set secondkey 0 3600 2
hi
STORED
gets secondkey
VALUE secondkey 0 2 23
hi
END
在node1节点使用telnet连接memcached服务器,并获取firstkey键和secondkey键的cas值:
[root@node1 ~]# telnet 192.168.56.11 11211
Trying 192.168.56.11...
Connected to 192.168.56.11.
Escape character is '^]'.
gets firstkey secondkey
VALUE firstkey 0 5 22
Hello
VALUE secondkey 0 2 23
hi
END
在node2节点使用telnet连接memcached服务器,并获取firstkey键和secondkey键的cas值:
[root@node2 ~]# telnet 192.168.56.11 11211
Trying 192.168.56.11...
Connected to 192.168.56.11.
Escape character is '^]'.
gets firstkey secondkey
VALUE firstkey 0 5 22
Hello
VALUE secondkey 0 2 23
hi
END
在node1节点更新firstkey键的值,增加一个”!”,并获取firstkey键的cas值:
set firstkey 0 3600 6
Hello!
STORED
gets firstkey
VALUE firstkey 0 6 24
Hello!
END
可以发现firstkey键的cas值已经变成24,并且其值已经是”Hello!”。
若此时,要在node2节点更新firstkey键的值,增加”. Memcached!”,并得到更新后的值为”Hello. Memcached!”, 并获取firstkey键的cas值,使用cas命令检查在最后一次取值后,是否有别的用户对数据进行了更新操作:
cas firstkey 0 3600 17 22
hello. Memcached!
EXISTS
可以发现,未能正常存储,返回”EXISTS”,说明在最后一次取值后另外一个用户也在更新该数据。
仍然在node2上面操作,再获取最新的CAS值,对firstkey进行更新:
gets firstkey
VALUE firstkey 0 6 24
Hello!
END
cas firstkey 0 3600 17 24 <-- 说明: 对firstkey键CAS值为24(也就是版本)进行修改
Hello! Memcached!
STORED
gets firstkey
VALUE firstkey 0 17 25 <-- 说明: 更新完成后,CAS版本号新增1,变成了25
Hello! Memcached!
END
此次能够正常的更新firstkey键的值,原因是firstkey键的CAS值为24后,并没有其他的用户对该键进行修改,也就可以避免多用户同时对一个键进行修改。
仍然在node2上面操作,返回 ERROR
或 NOT_FOUND
的情况:
cas thirdkey 0 3600 5 <-- 说明: 语法错误,未指定CAS版本号,返回"ERROR"
ERROR
cas thirdkey 0 3600 5 25 <-- 说明: thirdkey不存在,找不到CAS版本号是25的thirdkey,返回"NOT_FOUND"
hello
NOT_FOUND
memcached的查找命令 get
gets
delete
incr
decr
flush_all
¶
get
命令,获取存储到key键中的value值,如果key不存在,则返回空。
语法如下:
get key <-- 说明: 获取单个key的value值
get key1 key2 key3 <-- 说明: 获取多个key的value值
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
获取键的值:
set firstkey 0 12 5 <-- 说明:设置firstkey键12秒后过期
first
STORED
get firstkey <-- 说明:获取firstkey键的值,因为未过期,获取到值"first"
VALUE firstkey 0 5
first
END
get firstkey <-- 说明:获取firstkey键的值,因为已经过期,获取不到值
END
get firstkey
END
set secondkey 0 900 6 <-- 说明:设置secondkey键900秒后过期
second
STORED
set thirdkey 0 900 5 <-- 说明:设置thirdkey键900秒后过期
third
STORED
get firstkey secondkey thirdkey <-- 说明:同时获取多个键的值
VALUE secondkey 0 6
second
VALUE thirdkey 0 5
third
END
gets
命令,获取存储到key键中带有CAS令牌的value值,如果key不存在,则返回空。
语法如下:
gets key <-- 说明: 获取单个key的value值
gets key1 key2 key3 <-- 说明: 获取多个key的value值
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
接上例,获取带CAS令牌的键的值:
gets firstkey
END
gets secondkey
VALUE secondkey 0 6 16 <-- 说明:secondkey键的CAS令牌是16
second
END
gets thirdkey
VALUE thirdkey 0 5 17 <-- 说明:thirdkey键的CAS令牌是17
third
END
set fourthkey 0 900 6
fourth
STORED
get fourthkey
VALUE fourthkey 0 6
fourth
END
gets fourthkey
VALUE fourthkey 0 6 18 <-- 说明:fourthkey键的CAS令牌是18
fourth
END
delete
命令,删除已经存在的键。
语法如下:
delete key [noreply] <-- 说明: 删除key键
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
noreply: 告诉服务器不需要返回数据。
输出信息如下:
DELETED:删除成功。
ERROR:删除失败或语法错误。
NOT_FOUND:键不存在。
删除memcached中的键:
[root@server ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
set willdeletedkey 0 3600 5 <-- 说明: 设置键,存储成功
hello
STORED
get willdeletedkey
VALUE willdeletedkey 0 5
hello
END
delete willdeletedkey <-- 说明: 删除键,删除成功,返回"DELETED"
DELETED
get willdeletedkey
END
delete willdeletedkey <-- 说明: 删除键,删除失败,返回"NOT_FOUND",因为键已经被删除了,系统中已经找不到该键
NOT_FOUND
incr
命令,对已经存在的键进行自增操作,操作的数据必须是十进制的32位无符号整数。decr
命令,对已经存在的键进行自减操作,操作的数据必须是十进制的32位无符号整数。- 如果自增或自减成功,则返回自增或自减后的value值。
语法如下:
incr key increment_value <-- 说明: 对key的value进行自增操作,即value = value + increment_value
decr key decrement_value <-- 说明: 对key的value进行自减操作,即value = value + decrement_value
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
increment_value或decrement_value:增量值或减量值,必须是无符号整数。
输出信息如下:
CLIENT_ERROR:自增或自减的数据不是数字。
ERROR:自增或自减失败或语法错误。
NOT_FOUND:键不存在。
删除memcached中的键:
set num 0 3600 2 <-- 说明: 设置num键的值为55
55
STORED
gets num <-- 说明: 设置num键的值
VALUE num 0 2 27
55
END
incr num 50 <-- 说明: num键的值自增50,变成105
105
get num
VALUE num 0 3
105
END
decr num 100 <-- 说明: num键的值自减100,变成5
5
get num
VALUE num 0 1
5
END
异常输出:
incr num_notexist 1
NOT_FOUND
incr num abc
CLIENT_ERROR invalid numeric delta argument
incr num -7
CLIENT_ERROR invalid numeric delta argument
incr num 5.5
CLIENT_ERROR invalid numeric delta argument
incr num2abc
ERROR
incr num
ERROR
flush_all
命令,清除缓存中所有的key-value键值对。
语法如下:
flush_all [time] [noreply]
参数说明:
key:键值 key-value 结构中的 key,用于查找缓存值。
time: 用于在制定的时间后执行清理缓存操作。
noreply: 提示服务器端不需要返回数据。
清除所有键值对:
[root@server ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
stats sizes
END
stats items
END
set firstkey 0 3600 5 <-- 说明: 设置firstkey键的值为"hello"
hello
STORED
set secondkey 0 3600 6 <-- 说明: 设置secondkey键的值为"hello!"
hello!
STORED
gets firstkey secondkey <-- 说明: 获取firstkey键和secondkey键的信息
VALUE firstkey 0 5 1
hello
VALUE secondkey 0 6 2
hello!
END
flush_all <-- 说明: 清除缓存中所有的键值对,清除成功,返回"OK"
OK
gets firstkey secondkey <-- 说明: 获取firstkey键和secondkey键的信息,已经没有相关信息
END
memcached的统计命令 stats
stats items
stats slabs
stats sizes
¶
stats
命令,返回如PID、版本号、连接数、存储占用字节数等等统计信息。
语法如下:
stats
获取统计信息:
[root@server ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.
stats <-- 说明: 获取统计信息
STAT pid 13486 <-- 说明: Memcached服务端的PID
STAT uptime 35516 <-- 说明: 服务启动时间
STAT time 1559380900 <-- 说明: 服务端的Unix时间戳
STAT version 1.4.15 <-- 说明: Memcached的版本号
STAT libevent 2.0.21-stable <-- 说明:
STAT pointer_size 64 <-- 说明: 操作系统指针大小
STAT rusage_user 0.775030 <-- 说明: 进程累计用户时间
STAT rusage_system 1.469022 <-- 说明: 进程累计系统时间
STAT curr_connections 11 <-- 说明: 当前连接数量
STAT total_connections 26 <-- 说明: Memecached运行以来连接总数
STAT connection_structures 13 <-- 说明: Memcached分配的连接结构数量
STAT reserved_fds 20 <-- 说明:
STAT cmd_get 79 <-- 说明: get命令请求次数
STAT cmd_set 43 <-- 说明: set命令请求次数
STAT cmd_flush 0 <-- 说明: flush命令请求次数
STAT cmd_touch 0 <-- 说明: touch命令请求次数
STAT get_hits 50 <-- 说明: get命令命中次数
STAT get_misses 29 <-- 说明: get命令未命中次数
STAT delete_misses 1 <-- 说明: delete命令未命中次数
STAT delete_hits 1 <-- 说明: delete命令命中次数
STAT incr_misses 1 <-- 说明: incr命令未命中次数
STAT incr_hits 3 <-- 说明: incr命令命中次数
STAT decr_misses 0 <-- 说明: decr命令未命中次数
STAT decr_hits 1 <-- 说明: decr命令命中次数
STAT cas_misses 1 <-- 说明: cas命令未命中次数
STAT cas_hits 1 <-- 说明: cas命令命中次数
STAT cas_badval 2 <-- 说明: 使用擦拭次数
STAT touch_hits 0 <-- 说明: touch命令命中次数
STAT touch_misses 0 <-- 说明: touch命令未命中次数
STAT auth_cmds 0 <-- 说明: 认证命令处理的次数
STAT auth_errors 0 <-- 说明: 认证失败的次数
STAT bytes_read 4426 <-- 说明: 读取总字节数
STAT bytes_written 9572 <-- 说明: 写总字节数
STAT limit_maxbytes 67108864 <-- 说明: 分配的内存总字节大小
STAT accepting_conns 1 <-- 说明: 服务器是否达到过最大连接
STAT listen_disabled_num 0 <-- 说明: 失败的监听数
STAT threads 4 <-- 说明: 当前线程数
STAT conn_yields 0 <-- 说明: 连接操作主动放弃数目
STAT hash_power_level 16 <-- 说明: hashpower的level,可以在启动的时候设置
STAT hash_bytes 524288 <-- 说明: 内存使用总量单位为byte
STAT hash_is_expanding 0 <-- 说明: 是否正在扩大hash表
STAT bytes 161 <-- 说明: 当前存储占用的字节数
STAT curr_items 2 <-- 说明: 当前存储的数据总数
STAT total_items 31 <-- 说明: 启动以来存储的数据总数
STAT expired_unfetched 2 <-- 说明: item过期之前没有被touch过,也就是放进去之后就没更新过过期时间
STAT evicted_unfetched 0 <-- 说明: item替换覆盖之前没有被touch过,也就是放进去之后就没更新过过期时间
STAT evictions 0 <-- 说明: LRU释放的对象数目
STAT reclaimed 6 <-- 说明: 已过期的数据条目来存储新数据的数目
END
其他统计命令:
stats items
STAT items:1:number 2
STAT items:1:age 3182
STAT items:1:evicted 0
STAT items:1:evicted_nonzero 0
STAT items:1:evicted_time 0
STAT items:1:outofmemory 0
STAT items:1:tailrepairs 0
STAT items:1:reclaimed 6
STAT items:1:expired_unfetched 2
STAT items:1:evicted_unfetched 0
END
stats slabs
STAT 1:chunk_size 96
STAT 1:chunks_per_page 10922
STAT 1:total_pages 1
STAT 1:total_chunks 10922
STAT 1:used_chunks 2
STAT 1:free_chunks 10920
STAT 1:free_chunks_end 0
STAT 1:mem_requested 161
STAT 1:get_hits 49
STAT 1:cmd_set 43
STAT 1:delete_hits 1
STAT 1:incr_hits 3
STAT 1:decr_hits 1
STAT 1:cas_hits 1
STAT 1:cas_badval 2
STAT 1:touch_hits 0
STAT 2:chunk_size 120
STAT 2:chunks_per_page 8738
STAT 2:total_pages 1
STAT 2:total_chunks 8738
STAT 2:used_chunks 0
STAT 2:free_chunks 8738
STAT 2:free_chunks_end 0
STAT 2:mem_requested 0
STAT 2:get_hits 1
STAT 2:cmd_set 0
STAT 2:delete_hits 0
STAT 2:incr_hits 0
STAT 2:decr_hits 0
STAT 2:cas_hits 0
STAT 2:cas_badval 0
STAT 2:touch_hits 0
STAT active_slabs 2
STAT total_malloced 2097072
END
stats sizes
STAT 96 2
END
python-memcached处理NoSQL非关系型数据库memcached¶
安装python-memcached包:
[root@server ~]# pip install python-memcached
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Collecting python-memcached
Downloading https://mirrors.aliyun.com/pypi/packages/f5/90/19d3908048f70c120ec66a39e61b92c253e834e6e895cd104ce5e46cbe53/python_memcached-1.59-py2.py3-none-any.whl
Requirement already satisfied: six>=1.4.0 in /usr/lib/python3.6/site-packages (from python-memcached) (1.12.0)
Installing collected packages: python-memcached
Successfully installed python-memcached-1.59
启动ipython,导入memcache模块,并查看相关帮助信息:
[root@server ~]# ipython
Python 3.6.7 (default, Dec 5 2018, 15:02:05)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.
>>> import memcache
>>> memcache?
Type: module
String form: <module 'memcache' from '/usr/lib/python3.6/site-packages/memcache.py'>
File: /usr/lib/python3.6/site-packages/memcache.py
Docstring:
client module for memcached (memory cache daemon)
Overview
========
See U{the MemCached homepage<http://www.danga.com/memcached>} for more
about memcached.
Usage summary
=============
This should give you a feel for how this module operates::
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
mc.set("some_key", "Some value")
value = mc.get("some_key")
mc.set("another_key", 3)
mc.delete("another_key")
mc.set("key", "1") # note that the key used for incr/decr must be
# a string.
mc.incr("key")
mc.decr("key")
The standard way to use memcache with a database is like this:
key = derive_key(obj)
obj = mc.get(key)
if not obj:
obj = backend_api.get(...)
mc.set(key, obj)
# we now have obj, and future passes through this code
# will use the object from the cache.
Detailed Documentation
======================
More detailed documentation is available in the L{Client} class.
基本操作:
#!/usr/bin/python3
"""
@Time : 2019/6/2 19:58
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@File : usememcached.py
@Software: PyCharm
"""
import memcache
import time
def main():
"""main function"""
# 单节点memecached, debug=True表示运行出现错误时,显示错误信息,上线后需要移除该参数
mc = memcache.Client(['192.168.56.11:11211'], debug=True)
print('Memcached client:', mc)
result = mc.set('first_key', 'hello,memcached!') # set设置键值,如果键不存在则创建,键存在则修改
print('result: ', result) # 成功,则返回True
print("mc.get('first_key') = ", mc.get('first_key'))
print("mc.gets('first_key') = ", mc.gets('first_key'))
mc.set_multi({'second_key': 'hi', 'third_key': 'hi!'}) # 一次设置多个键
print("mc.get_multi(['second_key', 'third_key']) = ", mc.get_multi(['second_key', 'third_key']))
mc.replace('second_key', 'Hi!') # 替换
mc.append('third_key', 'Memcached!') # 尾部追加
mc.prepend('third_key', 'I am Python!') # 首部追加
print("mc.get_multi(['second_key', 'third_key']) = ", mc.get_multi(['second_key', 'third_key']))
# third_result = mc.add('third_key', 'add again') # add添加,如果键已经存在,则添加失败,返回False
mc.delete('fourth_key') # 删除键
print(mc.add('fourth_key', 'for you', time=2)) # add添加,如果键不存在,则添加成功,返回True,设置超时2秒
time.sleep(1) # 等待1秒
print("mc.get('fourth_key') = ", mc.get('fourth_key'))
time.sleep(1) # 等待1秒
print("mc.get('fourth_key') = ", mc.get('fourth_key'))
mc.set('num', 55)
print("mc.get('num') = ", mc.get('num'))
mc.incr('num', 50) # 自增
print("mc.get('num') = ", mc.get('num'))
mc.decr('num', 100) # 自减
print("mc.get('num') = ", mc.get('num'))
if __name__ == '__main__':
main()
运行usememcached.py显示结果如下:
Memcached client: <memcache.Client object at 0x000001D76186AB88>
result: True
mc.get('first_key') = hello,memcached!
mc.gets('first_key') = hello,memcached!
mc.get_multi(['second_key', 'third_key']) = {'second_key': 'hi', 'third_key': 'hi!'}
mc.get_multi(['second_key', 'third_key']) = {'second_key': 'Hi!', 'third_key': 'I am Python!hi!Memcached!'}
True
mc.get('fourth_key') = for you
mc.get('fourth_key') = None
mc.get('num') = 55
mc.get('num') = 105
mc.get('num') = 5
Process finished with exit code 0
使用memcached集群¶
使用memcached集群:
#!/usr/bin/python3
"""
@Time : 2019/6/2 21:15
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@File : use_memcached_cluster.py
@Software: PyCharm
"""
import memcache
import time
def main():
"""使用memcached集群,不设置权重"""
mc = memcache.Client(
[
'192.168.56.11:11211',
'192.168.56.12:11211',
'192.168.56.13:11211',
],
debug=True)
for num in range(30):
mc.set('num' + str(num), pow(num, 2))
for num in range(30):
print("mc.get('num' + str(num)) = ", mc.get('num' + str(num)))
if __name__ == '__main__':
main()
运行use_memcached_cluster.py,结果如下:
mc.get('num0') = 0
mc.get('num1') = 1
mc.get('num2') = 4
mc.get('num3') = 9
mc.get('num4') = 16
mc.get('num5') = 25
mc.get('num6') = 36
mc.get('num7') = 49
mc.get('num8') = 64
mc.get('num9') = 81
mc.get('num10') = 100
mc.get('num11') = 121
mc.get('num12') = 144
mc.get('num13') = 169
mc.get('num14') = 196
mc.get('num15') = 225
mc.get('num16') = 256
mc.get('num17') = 289
mc.get('num18') = 324
mc.get('num19') = 361
mc.get('num20') = 400
mc.get('num21') = 441
mc.get('num22') = 484
mc.get('num23') = 529
mc.get('num24') = 576
mc.get('num25') = 625
mc.get('num26') = 676
mc.get('num27') = 729
mc.get('num28') = 784
mc.get('num29') = 841
Process finished with exit code 0
在三个节点上面查看键值对信息:
node1 '192.168.56.11:11211':
gets num0 num1 num2 num3 num4 num5 num6 num7 num8 num9 num10 num11 num12 num13 num14 num15 num16 num17 num18 num19 num20 num21 num22 num23 num24 num25 num26 num27 num28 num29
VALUE num1 2 1 283
1
VALUE num2 2 1 284
4
VALUE num3 2 1 285
9
VALUE num6 2 2 286
36
VALUE num8 2 2 287
64
VALUE num10 2 3 288
100
VALUE num13 2 3 289
169
VALUE num14 2 3 290
196
VALUE num17 2 3 291
289
VALUE num23 2 3 292
529
VALUE num25 2 3 293
625
END
可以发现在node1上存储了11个键值对
node2 '192.168.56.12:11211':
gets num0 num1 num2 num3 num4 num5 num6 num7 num8 num9 num10 num11 num12 num13 num14 num15 num16 num17 num18 num19 num20 num21 num22 num23 num24 num25 num26 num27 num28 num29
VALUE num7 2 2 130
49
VALUE num11 2 3 131
121
VALUE num18 2 3 132
324
VALUE num21 2 3 133
441
VALUE num24 2 3 134
576
VALUE num29 2 3 135
841
END
可以发现在node2上存储了6个键值对
node3 '192.168.56.13:11211'
gets num0 num1 num2 num3 num4 num5 num6 num7 num8 num9 num10 num11 num12 num13 num14 num15 num16 num17 num18 num19 num20 num21 num22 num23 num24 num25 num26 num27 num28 num29
VALUE num0 2 1 142
0
VALUE num4 2 2 143
16
VALUE num5 2 2 144
25
VALUE num9 2 2 145
81
VALUE num12 2 3 146
144
VALUE num15 2 3 147
225
VALUE num16 2 3 148
256
VALUE num19 2 3 149
361
VALUE num20 2 3 150
400
VALUE num22 2 3 151
484
VALUE num26 2 3 152
676
VALUE num27 2 3 153
729
VALUE num28 2 3 154
784
END
可以发现在node3上存储了13个键值对
可以发现在3个节点上存储键值对的比例为11:6:13,三个节点上面存储的键值对数量差不多。
设置memcached集群的权重:
#!/usr/bin/python3
"""
@Time : 2019/6/2 21:15
@Author : Mei Zhaohui
@Email : mzh.whut@gmail.com
@File : use_memcached_cluster.py
@Software: PyCharm
"""
import memcache
import time
def main():
"""使用memcached集群,设置权重,按7:2:1权重保存数据"""
mc = memcache.Client(
[
('192.168.56.11:11211', 7),
('192.168.56.12:11211', 2),
('192.168.56.13:11211', 1)
],
debug=True)
mc.flush_all()
for num in range(30):
mc.set('num' + str(num), pow(num, 2))
for num in range(30):
print("mc.get('num{}') = {}".format(num, mc.get('num' + str(num))))
if __name__ == '__main__':
main()
在三个节点上面查看键值对信息:
node1 '192.168.56.11:11211':
gets num0 num1 num2 num3 num4 num5 num6 num7 num8 num9 num10 num11 num12 num13 num14 num15 num16 num17 num18 num19 num20 num21 num22 num23 num24 num25 num26 num27 num28 num29
VALUE num1 2 1 305
1
VALUE num2 2 1 306
4
VALUE num3 2 1 307
9
VALUE num4 2 2 308
16
VALUE num6 2 2 309
36
VALUE num8 2 2 310
64
VALUE num9 2 2 311
81
VALUE num10 2 3 312
100
VALUE num12 2 3 313
144
VALUE num14 2 3 314
196
VALUE num15 2 3 315
225
VALUE num16 2 3 316
256
VALUE num18 2 3 317
324
VALUE num19 2 3 318
361
VALUE num20 2 3 319
400
VALUE num21 2 3 320
441
VALUE num23 2 3 321
529
VALUE num28 2 3 322
784
VALUE num29 2 3 323
841
END
可以发现在node1上存储了19个键值对
node2 '192.168.56.12:11211':
gets num0 num1 num2 num3 num4 num5 num6 num7 num8 num9 num10 num11 num12 num13 num14 num15 num16 num17 num18 num19 num20 num21 num22 num23 num24 num25 num26 num27 num28 num29
VALUE num0 2 1 142
0
VALUE num13 2 3 143
169
VALUE num17 2 3 144
289
VALUE num24 2 3 145
576
VALUE num26 2 3 146
676
END
可以发现在node2上存储了5个键值对
node3 '192.168.56.13:11211':
gets num0 num1 num2 num3 num4 num5 num6 num7 num8 num9 num10 num11 num12 num13 num14 num15 num16 num17 num18 num19 num20 num21 num22 num23 num24 num25 num26 num27 num28 num29
VALUE num5 2 2 168
25
VALUE num7 2 2 169
49
VALUE num11 2 3 170
121
VALUE num22 2 3 171
484
VALUE num25 2 3 172
625
VALUE num27 2 3 173
729
END
可以发现在node3上存储了6个键值对
可以发现在3个节点上存储键值对的比例为19:5:6,明显在节点node1上面存储的数据多一些。
redis模块处理NoSQL非关系型数据库Redis¶
- MySQL是关系型数据库,是持久化存储的,查询检索的话,会涉及到磁盘IO操作,为了提高性能,可以使用缓存技术,Redis和memcached都是缓存数据库,,可以大大提升高数据量的web访问速度。
- memcached仅仅支持简单的key-value数据结构,而Redis支持的数据类型更多,如String、Hash、List、Set和Sorted Set。
- web应用中一般采用MySQL+Redis的方式,web应用每次先访问Redis,如果没有找到数据,才去访问MySQL。
- Redis性能极高: Redis读的速度是110000次/s,写的速度是81000次/s。
- Redis支持数据的持久化,可以将内存中的数据保存在磁盘中,重启的时候可以再次加载进行使用。
Redis的安装¶
参考 Redis Quick Start ,可知官方提供使用源码安装,因为Redis除了需要GCC和libc的支持外,没有别有依赖。
下载:
[root@server ~]# wget http://download.redis.io/redis-stable.tar.gz
--2019-06-18 22:20:19-- http://download.redis.io/redis-stable.tar.gz
Resolving download.redis.io (download.redis.io)... 109.74.203.151
Connecting to download.redis.io (download.redis.io)|109.74.203.151|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2014657 (1.9M) [application/x-gzip]
Saving to: ‘redis-stable.tar.gz.1’
15% [=============> ] 320,584 25.8KB/s eta 63s
[root@server ~]# ls -lah redis-stable.tar.gz
-rw-r--r--. 1 root root 2.0M May 16 00:26 redis-stable.tar.gz
解压:
[root@server ~]# tar -zxvf redis-stable.tar.gz
切换目录:
[root@server ~]# cd redis-stable
编译:
[root@server redis-stable]# make
[root@server redis-stable]# echo $?
0
安装:
[root@server redis-stable]# make install
[root@server redis-stable]# echo $?
0
说明安装成功!
检查redis命令:
[root@server redis-stable]# redis- 连按两次tab
redis-benchmark redis-check-aof redis-check-rdb redis-cli redis-sentinel redis-server
[root@server redis-stable]# whereis redis-server
redis-server: /usr/local/bin/redis-server
[root@server redis-stable]# whereis redis-*
redis-*:[root@server redis-stable]# ls -lah /usr/local/bin/redis*
-rwxr-xr-x. 1 root root 4.2M Jun 18 22:28 /usr/local/bin/redis-benchmark
-rwxr-xr-x. 1 root root 7.8M Jun 18 22:28 /usr/local/bin/redis-check-aof
-rwxr-xr-x. 1 root root 7.8M Jun 18 22:28 /usr/local/bin/redis-check-rdb
-rwxr-xr-x. 1 root root 4.6M Jun 18 22:28 /usr/local/bin/redis-cli
lrwxrwxrwx. 1 root root 12 Jun 18 22:28 /usr/local/bin/redis-sentinel -> redis-server
-rwxr-xr-x. 1 root root 7.8M Jun 18 22:28 /usr/local/bin/redis-server
- redis-server是Redis Server本身。
- redis-sentinel是Redis Sentinel可执行文件(监视和故障转移)。
- redis-cli是与Redis交互的命令行界面实用程序。
- redis-benchmark用于检查Redis的性能。
- redis-check-aof和redis-check-dump在极少数损坏的数据文件中很有用。
启动Redis¶
最简单的启动Redis的方式是直接运行redis-server命令:
[root@server ~]# redis-server
17608:C 18 Jun 2019 22:38:37.232 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
17608:C 18 Jun 2019 22:38:37.232 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=17608, just started
17608:C 18 Jun 2019 22:38:37.232 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
17608:M 18 Jun 2019 22:38:37.233 * Increased maximum number of open files to 10032 (it was originally set to 1024).
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 5.0.5 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in standalone mode
|`-._`-...-` __...-.``-._|'` _.-'| Port: 6379
| `-._ `._ / _.-' | PID: 17608
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' | http://redis.io
`-._ `-._`-.__.-'_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'_.-' |
`-._ `-._`-.__.-'_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'
17608:M 18 Jun 2019 22:38:37.233 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
17608:M 18 Jun 2019 22:38:37.233 # Server initialized
17608:M 18 Jun 2019 22:38:37.233 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
17608:M 18 Jun 2019 22:38:37.233 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
17608:M 18 Jun 2019 22:38:37.233 * Ready to accept connections
能够正常看到上面的输出则说明Redis安装成功啦!!
检查Redis是否正常工作¶
在SecureCRT克隆一个Redis-server的新的窗口,并使用redis-cli命令与Redis通信,简单的运行 ping
看是否能ping通:
[root@server ~]# redis-cli ping
PONG
[root@server ~]#
[root@server ~]# redis-cli
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> ping
PONG
127.0.0.1:6379>
查看redis-cli帮助信息:
127.0.0.1:6379> help
redis-cli 5.0.5
To get help about Redis commands type:
"help @<group>" to get a list of commands in <group>
"help <command>" for help on <command>
"help <tab>" to get a list of possible help topics
"quit" to exit
To set redis-cli preferences:
":set hints" enable online hints
":set nohints" disable online hints
Set your preferences in ~/.redisclirc
关闭远程的Redis服务器:
127.0.0.1:6379> help shutdown
SHUTDOWN [NOSAVE|SAVE]
summary: Synchronously save the dataset to disk and then shut down the server
since: 1.0.0
group: server
127.0.0.1:6379> shutdown SAVE
not connected> quit
[root@server ~]#
在Redis服务器端的前台可以看到打印的消息如下:
17608:M 18 Jun 2019 22:51:39.238 # User requested shutdown...
17608:M 18 Jun 2019 22:51:39.238 * Saving the final RDB snapshot before exiting.
17608:M 18 Jun 2019 22:51:39.576 * DB saved on disk
17608:M 18 Jun 2019 22:51:39.576 # Redis is now ready to exit, bye bye...
说明通信正常,redis-cli能够正常的控制redis-server端工作。
Redis配置¶
在上面的示例中,Redis在没有任何显式配置文件的情况下启动,因此所有参数都将使用内部默认值。 如果你正在使用它来开发Redis或者用于开发,那么这是完全正常的,但对于生产环境,你应该使用配置文件。
我们使用Redis配置文件。
将源文件中的redis.conf复制到/etc目录下:
[root@server ~]# cp ~/redis-stable/redis.conf /etc/redis.conf
[root@server ~]# ls -lah /etc/redis.conf
-rw-r--r--. 1 root root 61K Jun 18 22:57 /etc/redis.conf
前面在启动redis服务器后,都是在前台启动的,需要重新启动一个客户端来进行登陆操作。这样非常不方便,所以我们需要设置后台启动。
修改配置文件,将 daemonize no
修改为 daemonize yes
[root@server ~]# sed -i '136s/daemonize no/daemonize yes/g' /etc/redis.conf
[root@server ~]# cat -n /etc/redis.conf|sed -n '134,136p'
134 # By default Redis does not run as a daemon. Use 'yes' if you need it.
135 # Note that Redis will write a pid file in /var/run/redis.pid when daemonized.
136 daemonize yes
[root@server ~]#
启动Redis时指定配置文件:
[root@server ~]# redis-server /etc/redis.conf
17750:C 18 Jun 2019 23:03:37.223 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
17750:C 18 Jun 2019 23:03:37.223 # Redis version=5.0.5, bits=64, commit=00000000, modified=0, pid=17750, just started
17750:C 18 Jun 2019 23:03:37.223 # Configuration loaded
[root@server ~]# ps -ef|grep redis
root 17751 1 0 23:03 ? 00:00:00 redis-server 127.0.0.1:6379
root 17756 13228 0 23:03 pts/0 00:00:00 grep --color=auto redis
[root@server ~]#
[root@server ~]# redis-cli ping
PONG
[root@server ~]# redis-cli
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> shutdown SAVE
not connected>
not connected>
not connected> quit
[root@server ~]# ps -ef|grep redis
root 17766 13228 0 23:07 pts/0 00:00:00 grep --color=auto redis
可以发现Redis已经在后台运行了!不需要另外开窗口就可以运行redis-cli命令了!
将Redis配置为系统服务¶
配置Redis为系统服务实质是在 /usr/lib/systemd/system/
目录下创建一下 redis.service
文件。
我们配置一下 /usr/lib/systemd/system/redis.service
,其内容如下:
[root@server ~]# cat /usr/lib/systemd/system/redis.service
[Unit]
Description=Redis Server Manager
After=network.target
[Service]
Type=forking
PIDFile=/var/run/redis_6379.pid
ExecStartPre=/usr/local/bin/redis-server -v
ExecStartPost=/usr/bin/echo "Done!!!"
ExecStart=/usr/local/bin/redis-server /etc/redis.conf
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/usr/local/bin/redis-cli shutdown
KillSignal=SIGQUIT
TimeoutStopSec=5
KillMode=process
PrivateTmp=true
[Install]
WantedBy=multi-user.target
[root@server ~]#
配置一下 /etc/redis.conf
设置一下 logfile
目录,并且创建目录 /var/log/redis/
[root@server ~]# cat -n /etc/redis.conf|sed -n '168,172p'
168 # Specify the log file name. Also the empty string can be used to force
169 # Redis to log on the standard output. Note that if you use standard
170 # output for logging but daemonize, logs will be sent to /dev/null
171 logfile "/var/log/redis/redis.log"
172
[root@server ~]# mkdir -p /var/log/redis/
重新启动systemctl:
[root@server ~]# systemctl daemon-reload
测试Redis的启动、查看状态、停止、重启等:
[root@server ~]# systemctl daemon-reload
# 说明:启动Redis服务
[root@server ~]# systemctl start redis
[root@server ~]# systemctl status redis
● redis.service - Redis Server Manager
Loaded: loaded (/usr/lib/systemd/system/redis.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2019-06-19 22:11:42 CST; 7s ago
Process: 13906 ExecStartPost=/usr/bin/echo Done!!! (code=exited, status=0/SUCCESS)
Process: 13903 ExecStart=/usr/local/bin/redis-server /etc/redis.conf (code=exited, status=0/SUCCESS)
Process: 13901 ExecStartPre=/usr/local/bin/redis-server -v (code=exited, status=0/SUCCESS)
Main PID: 13905 (redis-server)
Tasks: 4
Memory: 6.3M
CGroup: /system.slice/redis.service
└─13905 /usr/local/bin/redis-server 127.0.0.1:6379
Jun 19 22:11:42 server.hopewait systemd[1]: Starting Redis Server Manager...
Jun 19 22:11:42 server.hopewait redis-server[13901]: Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=a...9cfcb5
Jun 19 22:11:42 server.hopewait systemd[1]: Started Redis Server Manager.
Jun 19 22:11:42 server.hopewait echo[13906]: Done!!!
Hint: Some lines were ellipsized, use -l to show in full.
[root@server ~]# ps -ef|grep redis
root 13905 1 0 22:11 ? 00:00:00 /usr/local/bin/redis-server 127.0.0.1:6379
root 13914 13280 0 22:12 pts/0 00:00:00 grep --color=auto redis
[root@server ~]# netstat -tunlp|grep redis
tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 13905/redis-server
# 说明:重启Redis服务
[root@server ~]# systemctl restart redis
[root@server ~]# systemctl status redis
● redis.service - Redis Server Manager
Loaded: loaded (/usr/lib/systemd/system/redis.service; disabled; vendor preset: disabled)
Active: active (running) since Wed 2019-06-19 22:13:45 CST; 7s ago
Process: 13928 ExecStop=/usr/local/bin/redis-cli shutdown (code=exited, status=0/SUCCESS)
Process: 13936 ExecStartPost=/usr/bin/echo Done!!! (code=exited, status=0/SUCCESS)
Process: 13933 ExecStart=/usr/local/bin/redis-server /etc/redis.conf (code=exited, status=0/SUCCESS)
Process: 13932 ExecStartPre=/usr/local/bin/redis-server -v (code=exited, status=0/SUCCESS)
Main PID: 13935 (redis-server)
Tasks: 4
Memory: 6.3M
CGroup: /system.slice/redis.service
└─13935 /usr/local/bin/redis-server 127.0.0.1:6379
Jun 19 22:13:45 server.hopewait systemd[1]: Stopped Redis Server Manager.
Jun 19 22:13:45 server.hopewait systemd[1]: Starting Redis Server Manager...
Jun 19 22:13:45 server.hopewait redis-server[13932]: Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=a...9cfcb5
Jun 19 22:13:45 server.hopewait systemd[1]: Started Redis Server Manager.
Jun 19 22:13:45 server.hopewait echo[13936]: Done!!!
Hint: Some lines were ellipsized, use -l to show in full.
# 说明:停止Redis服务
[root@server ~]# systemctl stop redis
[root@server ~]# systemctl status redis
● redis.service - Redis Server Manager
Loaded: loaded (/usr/lib/systemd/system/redis.service; disabled; vendor preset: disabled)
Active: inactive (dead)
Jun 19 22:11:42 server.hopewait systemd[1]: Started Redis Server Manager.
Jun 19 22:11:42 server.hopewait echo[13906]: Done!!!
Jun 19 22:13:44 server.hopewait systemd[1]: Stopping Redis Server Manager...
Jun 19 22:13:45 server.hopewait systemd[1]: Stopped Redis Server Manager.
Jun 19 22:13:45 server.hopewait systemd[1]: Starting Redis Server Manager...
Jun 19 22:13:45 server.hopewait redis-server[13932]: Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=a...9cfcb5
Jun 19 22:13:45 server.hopewait systemd[1]: Started Redis Server Manager.
Jun 19 22:13:45 server.hopewait echo[13936]: Done!!!
Jun 19 22:15:54 server.hopewait systemd[1]: Stopping Redis Server Manager...
Jun 19 22:15:54 server.hopewait systemd[1]: Stopped Redis Server Manager.
Hint: Some lines were ellipsized, use -l to show in full.
[root@server ~]# ps -ef|grep redis
root 13956 13280 0 22:16 pts/0 00:00:00 grep --color=auto redis
将Redis服务加入开机启动:
[root@server ~]# systemctl enable redis
Created symlink from /etc/systemd/system/multi-user.target.wants/redis.service to /usr/lib/systemd/system/redis.service.
是否开机自启:
[root@server ~]# systemctl is-enabled redis
enabled
修改Redis配置¶
创建Redis缓存文件目录 /var/redis-data
[root@server ~]# mkdir -p /var/redis-data
[root@server ~]# ls -lah /var/redis-data/
下面介绍几个比较重要的配置:
# 设置客户端连接时的超时时间,单位为秒。当客户端在这段时间内没有发出任何指令,那么关闭该连接
# Close the connection after a client is idle for N seconds (0 to disable)
timeout 300
# 后台启动Redis
# By default Redis does not run as a daemon. Use 'yes' if you need it.
# Note that Redis will write a pid file in /var/run/redis.pid when daemonized.
daemonize yes
# 设置PID文件
# If a pid file is specified, Redis writes it where specified at startup
# and removes it at exit.
#
# When the server runs non daemonized, no pid file is created if none is
# specified in the configuration. When the server is daemonized, the pid file
# is used even if not specified, defaulting to "/var/run/redis.pid".
#
# Creating a pid file is best effort: if Redis is not able to create it
# nothing bad happens, the server will start and run normally.
pidfile /var/run/redis_6379.pid
# 指定日志级别
# Specify the server verbosity level.
# This can be one of:
# debug (a lot of information, useful for development/testing)
# verbose (many rarely useful info, but not a mess like the debug level)
# notice (moderately verbose, what you want in production probably)
# warning (only very important / critical messages are logged)
loglevel notice
# 指定日志文件,默认值为stdout,标准输出,若后台模式会输出到/dev/null
# Specify the log file name. Also the empty string can be used to force
# Redis to log on the standard output. Note that if you use standard
# output for logging but daemonize, logs will be sent to /dev/null
logfile "/var/log/redis/redis.log"
# 可用数据库数量
# Set the number of databases. The default database is DB 0, you can select
# a different one on a per-connection basis using SELECT <dbid> where
# dbid is a number between 0 and 'databases'-1
databases 16
# 保存到磁盘,指出在多长时间内,有多少次更新操作,就将数据同步到数据文件rdb
# 默认配置文件中的设置,就设置了三个条件:
# save 900 1 900秒内至少有1个key被改变
# save 300 10 300秒内至少有300个key被改变
# save 60 10000 60秒内至少有10000个key被改变
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1
save 300 10
save 60 10000
# 本地持久化数据库文件名,默认值为 dump.rdb
# The filename where to dump the DB
dbfilename dump.rdb
# 缓存目录,默认为当前目录./
# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
#
# The Append Only File will also be created inside this directory.
#
# Note that you must specify a directory here, not a file name.
dir /var/redis-data
# 设置最大客户端连接数,默认为10000
# Set the max number of connected clients at the same time. By default
# this limit is set to 10000 clients, however if the Redis server is not
# able to configure the process file limit to allow for the specified limit
# the max number of allowed clients is set to the current file limit
# minus 32 (as Redis reserves a few file descriptors for internal uses).
#
# Once the limit is reached Redis will close all the new connections sending
# an error 'max number of clients reached'.
#
maxclients 128
# 客户端连接密码设置,官方推荐使用防火墙禁止外部连接到Redis,因为是本机访问,可以不设置密码
################################## SECURITY ###################################
# Require clients to issue AUTH <PASSWORD> before processing any other
# commands. This might be useful in environments in which you do not trust
# others with access to the host running redis-server.
#
# This should stay commented out for backward compatibility and because most
# people do not need auth (e.g. they run their own servers).
#
# Warning: since Redis is pretty fast an outside user can try up to
# 150k passwords per second against a good box. This means that you should
# use a very strong password otherwise it will be very easy to break.
#
# requirepass foobared
注意,需要检查上面的 /var/redis-data/
目录是否创建成功,否则下面的redis重启无法启动!
测试Redis是否保存数据到磁盘,先重启一下Redis服务,再写入数据:
[root@server ~]# systemctl restart redis
[root@server ~]# redis-cli
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> set firstkey Hello,Redis
OK
127.0.0.1:6379> get firstkey
"Hello,Redis"
127.0.0.1:6379>
127.0.0.1:6379> SAVE
OK
127.0.0.1:6379> quit
查看 /var/redis-data/
目录,发现已经写入数据:
[root@server ~]# ls -lah /var/redis-data/
total 8.0K
drwxr-xr-x. 2 root root 22 Jun 19 23:27 .
drwxr-xr-x. 21 root root 4.0K Jun 19 22:36 ..
-rw-r--r--. 1 root root 119 Jun 19 23:27 dump.rdb
为了让远程能够访问Redis服务器,可以防火墙开放6379端口:
[root@server ~]# firewall-cmd --permanent --add-port=6379/tcp
success
[root@server ~]# firewall-cmd --reload
success
[root@server ~]# firewall-cmd --list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: enp0s3 enp0s8
sources:
services: ssh dhcpv6-client ftp
ports: 21/tcp 6379/tcp
protocols:
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
Python处理Redis¶
- Python的Redis驱动程序redis-py在GitHub托管代码和测试用例。
- 在线文档 https://redis-py.readthedocs.io/en/latest/genindex.html
- 安装
pip install redis
Redis字符串¶
- 具有单一值的一个键被称作Redis的字符串。简单的Python数据类型可以自动转换成Redis字符串。
下面连接远程主机上指定端口的Redis服务器:
$ ipython
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
>>> import redis
>>> conn = redis.Redis('192.168.56.103', 6379)
>>> conn
Redis<ConnectionPool<Connection<host=192.168.56.103,port=6379,db=0>>>
列出所有的键(目前为空):
>>> conn.keys('*')
... 省略
ConnectionError: Error 10061 connecting to 192.168.56.103:6379. 由于目标计算机积极拒绝,无法连接。.
可以发现无法连接到远程Redis服务器,这是由于Redis默认禁止远程访问。
由于Redis增加了 protected-mode
保护机制,并且通过 bind 127.0.0.1
来限制了ip访问,默认为127.0.0.1, 查看 /ect/redis.conf
配置文件内容:
66 # IF YOU ARE SURE YOU WANT YOUR INSTANCE TO LISTEN TO ALL THE INTERFACES
67 # JUST COMMENT THE FOLLOWING LINE.
68 # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
69 bind 127.0.0.1
84 # By default protected mode is enabled. You should disable it only if
85 # you are sure you want clients from other hosts to connect to Redis
86 # even if no authentication is configured, nor a specific set of interfaces
87 # are explicitly listed using the "bind" directive.
88 protected-mode yes
503 # Warning: since Redis is pretty fast an outside user can try up to
504 # 150k passwords per second against a good box. This means that you should
505 # use a very strong password otherwise it will be very easy to break.
506 #
507 # requirepass foobared
为了保证Redis服务器的安全,我们给远程访问设置一个访问密码,通过requirepass设置,建议设置一个非常强壮的密码,我这边测试,使用密码123456:
[root@hellolinux ~]# cp /etc/redis.conf /etc/redis.conf.bak
[root@hellolinux ~]# sed -i 's/^# requirepass foobared/requirepass 123456/g' /etc/redis.conf
[root@hellolinux ~]# sed -i 's/^bind 127.0.0.1/#bind 127.0.0.1/g' /etc/redis.conf
[root@hellolinux ~]# diff /etc/redis.conf /etc/redis.conf.bak
69c69
< #bind 127.0.0.1
---
> bind 127.0.0.1
507c507
< requirepass 123456
---
> # requirepass foobared
[root@hellolinux ~]# cat -n /etc/redis.conf|sed -n '69p;507p'
69 #bind 127.0.0.1
507 requirepass 123456
重启Redis服务:
[root@hellolinux ~]# systemctl restart redis
[root@hellolinux ~]# systemctl status redis
● redis.service - Redis Server Manager
Loaded: loaded (/usr/lib/systemd/system/redis.service; enabled; vendor preset: disabled)
Active: active (running) since Sat 2019-08-24 23:28:39 CST; 7s ago
Process: 13839 ExecStop=/usr/local/bin/redis-cli shutdown (code=exited, status=0/SUCCESS)
Process: 13847 ExecStartPost=/usr/bin/echo Done!!! (code=exited, status=0/SUCCESS)
Process: 13843 ExecStart=/usr/local/bin/redis-server /etc/redis.conf (code=exited, status=0/SUCCESS)
Process: 13842 ExecStartPre=/usr/local/bin/redis-server -v (code=exited, status=0/SUCCESS)
Main PID: 13846 (redis-server)
CGroup: /system.slice/redis.service
└─13846 /usr/local/bin/redis-server 127.0.0.1:6379
Aug 24 23:28:39 hellolinux.com systemd[1]: Stopped Redis Server Manager.
Aug 24 23:28:39 hellolinux.com systemd[1]: Starting Redis Server Manager...
Aug 24 23:28:39 hellolinux.com redis-server[13842]: Redis server v=5.0.5 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=f7b5a960d4c13390
Aug 24 23:28:39 hellolinux.com echo[13847]: Done!!!
Aug 24 23:28:39 hellolinux.com systemd[1]: Started Redis Server Manager.
再次连接远程服务器:
下面连接远程主机上指定端口的Redis服务器:
$ ipython
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.4.0 -- An enhanced Interactive Python. Type '?' for help.
>>> import redis
>>> conn = redis.Redis(host='192.168.56.103', port=6379,password='123456')
>>> conn
Redis<ConnectionPool<Connection<host=192.168.56.103,port=6379,db=0>>>
列出所有的键(目前为空):
>>> conn.keys?
Signature: conn.keys(pattern='*')
Docstring: Returns a list of keys matching ``pattern``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.keys('*')
[]
说明已经可以正常取出远程Redis服务器的数据了!
设置和获取数据:
>>> conn.set?
Signature: conn.set(name, value, ex=None, px=None, nx=False, xx=False)
Docstring:
Set the value at key ``name`` to ``value``
``ex`` sets an expire flag on key ``name`` for ``ex`` seconds.
``px`` sets an expire flag on key ``name`` for ``px`` milliseconds.
``nx`` if set to True, set the value at key ``name`` to ``value`` if it
does not already exist.
``xx`` if set to True, set the value at key ``name`` to ``value`` if it
already exists.
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.set('secret','nil')
True
>>> conn.get?
Signature: conn.get(name)
Docstring: Return the value at key ``name``, or None if the key doesn't exist
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.get('secret')
b'nil'
>>> conn.keys('*')
[b'secret']
删除键:
>>> conn.delete?
Signature: conn.delete(*names)
Docstring: Delete one or more keys specified by ``names``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.delete('secret')
1
>>> conn.keys('*')
[]
依次设置多个值,并获取相应的值:
>>> conn.set('first', 'hello')
True
>>> conn.set('second', 2)
True
>>> conn.set('third', 3.14)
True
>>> conn.get('first')
b'hello'
>>> conn.get('second')
b'2'
>>> conn.get('third')
b'3.14'
setnx()
方法 只有当键不存在时 才设定值:
>>> conn.setnx?
Signature: conn.setnx(name, value)
Docstring: Set the value of key ``name`` to ``value`` if key doesn't exist
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.setnx('notexist', 'no')
True
>>> conn.setnx('first', 'hello,redis')
False
>>> conn.get('notexist')
b'no'
>>> conn.get('first')
b'hello'
设置’notexist’成功,因为’notexist’不存在;而设置’first’失败,因为之前已经设置了’first’这个键。
getset()
方法会给键设置新值,并返回旧的键值:
>>> conn.getset?
Signature: conn.getset(name, value)
Docstring:
Sets the value at key ``name`` to ``value``
and returns the old value at key ``name`` atomically.
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.getset('first', 'hello,redis')
b'hello'
>>> conn.get('first')
b'hello,redis'
getrange()
方法获取键值的子串,start
和 end
都会包含在内:
>>> conn.getrange?
Signature: conn.getrange(key, start, end)
Docstring:
Returns the substring of the string value stored at ``key``,
determined by the offsets ``start`` and ``end`` (both are inclusive)
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.getrange('first', 7, -1)
b'edis'
>>> conn.getrange('first', 6, -1)
b'redis'
>>> conn.getrange('first', 0, 4)
b'hello'
>>> conn.getrange('first', 0,-1)
b'hello,redis'
偏移量offset中,0代表开始,-1代表结束。
setrange()
方法对键值的子串进行替换,并返回替换后键值的长度,如果偏移量超过了原来键值的长度,则会使用空值补空占位:
>>> conn.setrange?
Signature: conn.setrange(name, offset, value)
Docstring:
Overwrite bytes in the value of ``name`` starting at ``offset`` with
``value``. If ``offset`` plus the length of ``value`` exceeds the
length of the original value, the new value will be larger than before.
If ``offset`` exceeds the length of the original value, null bytes
will be used to pad between the end of the previous value and the start
of what's being injected.
Returns the length of the new string.
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.setrange('first',6,'Redis')
11
>>> conn.get('first')
b'hello,Redis'
>>> conn.setrange('first', 12, '!!!')
15
>>> conn.get('first')
b'hello,Redis\x00!!!'
mset()
一次设置多个键值对,可使用字典或关键字参数创建多个键值对:
>>> conn.mset?
Signature: conn.mset(*args, **kwargs)
Docstring:
Sets key/values based on a mapping. Mapping can be supplied as a single
dictionary argument or as kwargs.
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.keys()
[b'notexist', b'first', b'second', b'third']
# 通过字典创建多个键值对
>>> conn.mset({'four': '4', 'five': 5})
True
>>> conn.keys()
[b'five', b'first', b'third', b'four', b'notexist', b'second']
# 通过关键字参数创建多个键值对
>>> conn.mset(name="Redis",version="5.0.5")
True
>>> conn.keys()
[b'five',
b'first',
b'name',
b'third',
b'four',
b'notexist',
b'version',
b'second']
>>> conn.get('four')
b'4'
>>> conn.get('five')
b'5'
>>> conn.get('name')
b'Redis'
>>> conn.get('version')
b'5.0.5'
mget()
一次获取多个键的值:
>>> conn.mget?
Signature: conn.mget(keys, *args)
Docstring: Returns a list of values ordered identically to ``keys``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.mget(['four', 'five', 'name', 'version'])
[b'4', b'5', b'Redis', b'5.0.5']
>>> conn.mget('four', 'five', 'name', 'version')
[b'4', b'5', b'Redis', b'5.0.5']
使用 incr()
或 incrbyfloat()
增加值,decr()
减少值, 没有 decrbyfloat()
函数,可以用增加负数代替:
>>> conn.incr?
Signature: conn.incr(name, amount=1)
Docstring:
Increments the value of ``key`` by ``amount``. If no key exists,
the value will be initialized as ``amount``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.incrbyfloat?
Signature: conn.incrbyfloat(name, amount=1.0)
Docstring:
Increments the value at key ``name`` by floating ``amount``.
If no key exists, the value will be initialized as ``amount``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.decr?
Signature: conn.decr(name, amount=1)
Docstring:
Decrements the value of ``key`` by ``amount``. If no key exists,
the value will be initialized as 0 - ``amount``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.incr('four')
5
>>> conn.get('four')
b'5'
>>> conn.incr('four', 2)
7
>>> conn.get('four')
b'7'
>>> conn.incrbyfloat('third')
4.14
>>> conn.get('third')
b'4.14'
>>> conn.incrbyfloat('third', '3')
7.14
>>> conn.get('third')
b'7.14'
>>> conn.decr('four')
6
>>> conn.decr('four', '2')
4
>>> conn.get('four')
b'4'
>>> conn.incrbyfloat('third', '-3')
4.14
>>> conn.get('third')
b'4.14'
>>> conn.incr('four', -2)
2
>>> conn.get('four')
b'2'
>>> conn.incr('four', -2)
0
>>> conn.get('four')
b'0'
Redis列表¶
- Redis列表仅能包含字符串。
lpush()
朝列表头部插入数据。
向列表’rlist’头部(left)插入数据:
>>> conn.lpush?
Signature: conn.lpush(name, *values)
Docstring: Push ``values`` onto the head of the list ``name``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.lpush('rlist', 'one')
1
lrange()
获取列表给定偏移量的所有值,start和end不能省略,偏移量0到-1表示获取所有的数据。
获取列表’rlist’刚才插入的数据:
>>> conn.lrange?
Signature: conn.lrange(name, start, end)
Docstring:
Return a slice of the list ``name`` between
position ``start`` and ``end``
``start`` and ``end`` can be negative numbers just like
Python slicing notation
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.lrange('rlist', 0, -1)
[b'one']
向列表’rlist’头部(left)一次插入多条数据:
>>> conn.lpush('rlist', 'two', 'three')
3
>>> conn.lrange('rlist', 0, -1)
[b'three', b'two', b'one']
可以看到在前的数据会先插入到列表头部,靠后的数据后插入列表头部,如先插入’two’,后插入’three’。
linsert()
在一个值的前或者后插入数据。
获取列表’rlist’刚才插入的数据:
>>> conn.linsert?
Signature: conn.linsert(name, where, refvalue, value)
Docstring:
Insert ``value`` in list ``name`` either immediately before or after
[``where``] ``refvalue``
Returns the new length of the list on success or -1 if ``refvalue``
is not in the list.
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.linsert('rlist', 'before', 'two', 'before_2')
4
>>> conn.linsert('rlist', 'after', 'two', 'after_2')
5
>>> conn.lrange('rlist', 0, -1)
[b'three', b'before_2', b'two', b'after_2', b'one']
可以看出插入数据的前后相对 refvalue
就是列表的前后顺序,我们在Redis服务器上面也可以看到列表数据:
[root@hellolinux ~]# redis-cli -a 123456 2>/dev/null
127.0.0.1:6379> ping
PONG
127.0.0.1:6379> LRANGE rlist 0 -1
1) "three"
2) "before_2"
3) "two"
4) "after_2"
5) "one"
127.0.0.1:6379>
lset()
替换列表指定索引处的值,index索引必须真实存在,否则会提示”ResponseError: index out of range”异常。
替换列表’rlist’指定索引处的数据:
>>> conn.lset?
Signature: conn.lset(name, index, value)
Docstring: Set ``position`` of list ``name`` to ``value``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
# 先将索引1处的"before_2"替换成了"lset"
>>> conn.lset('rlist', 1, 'lset')
True
>>> conn.lrange('rlist', 0, -1)
[b'three', b'lset', b'two', b'after_2', b'one']
# 再将索引为-1,也就是列表尾部数据替换成"tail"
>>> conn.lset('rlist', -1, 'tail')
True
>>> conn.lrange('rlist', 0, -1)
[b'three', b'lset', b'two', b'after_2', b'tail']
此时在Redis服务器上面也可以通过 LRANGE
看到列表数据:
127.0.0.1:6379> LRANGE rlist 0 -1
1) "three"
2) "lset"
3) "two"
4) "after_2"
5) "tail
lindex()
获取列表指定索引处的值,index索引不存在时,返回None
。
获取列表’rlist’指定索引处的数据:
>>> conn.lindex?
Signature: conn.lindex(name, index)
Docstring:
Return the item from list ``name`` at position ``index``
Negative indexes are supported and will return an item at the
end of the list
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
# 获取索引1处的值
>>> conn.lindex('rlist', 1)
b'lset'
# 获取索引0处的值,也就是列表头部
>>> conn.lindex('rlist', 0)
b'three'
# 获取索引-1处的值,也就是列表尾部
>>> conn.lindex('rlist', -1)
b'tail'
>>> conn.lindex('rlist', -2)
b'after_2'
此时在Redis服务器上面也可以通过 LINDEX
看到列表数据:
127.0.0.1:6379> LINDEX rlist 1
"lset"
127.0.0.1:6379> LINDEX rlist 0
"three"
127.0.0.1:6379> LINDEX rlist -1
"tail"
127.0.0.1:6379> LINDEX rlist -2
"after_2"
llen()
获取列表的长度 。
获取列表’rlist’的长度:
>>> conn.llen?
Signature: conn.llen(name)
Docstring: Return the length of the list ``name``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.llen('rlist')
5
此时在Redis服务器上面也可以通过 LLEN
看到列表的长度:
127.0.0.1:6379> LLEN rlist
(integer) 5
lpop()
将列表头部的第一个元素弹出。
获取列表’rlist’指定索引处的数据:
>>> conn.lpop?
Signature: conn.lpop(name)
Docstring: Remove and return the first item of the list ``name``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.lpop('rlist')
b'three'
>>> conn.lrange('rlist', 0, -1)
[b'lset', b'two', b'after_2', b'tail']
此时在Redis服务器上面查看数据,可以发现头部的第一个元素”three”已经被弹出,即删除掉了:
127.0.0.1:6379> LRANGE rlist 0 -1
1) "lset"
2) "two"
3) "after_2"
4) "tail"
lpushx()
将列表(列表必须存在)头部的插入数据。
向不存在、存在的列表中插入数据:
>>> conn.lpushx?
Signature: conn.lpushx(name, value)
Docstring: Push ``value`` onto the head of the list ``name`` if ``name`` exists
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
# 列表"a"不存在,插入数据失败
>>> conn.lpushx('a', 'b')
0
# 获取到空列表
>>> conn.lrange('a', 0, -1)
[]
# 向已经存在的"rlist"插入数据,插入数据成功
>>> conn.lpushx('rlist', 'lpushx')
5
>>> conn.lrange('rlist', 0, -1)
[b'lpushx', b'lset', b'two', b'after_2', b'tail']
lastsave()
返回Redis数据库保存到磁盘的最后时间。
查看数据保存到磁盘的最后时间:
>>> conn.lastsave?
Signature: conn.lastsave()
Docstring:
Return a Python datetime object representing the last time the
Redis database was saved to disk
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.lastsave
<bound method StrictRedis.lastsave of Redis<ConnectionPool<Connection<host=192.168.56.103,port=6379,db=0>>>>
>>> conn.lastsave()
datetime.datetime(2019, 8, 25, 19, 59, 16)
rpush()
朝列表尾部插入数据。
向列表尾部和头部一次插入多个数据:
>>> conn.rpush?
Signature: conn.rpush(name, *values)
Docstring: Push ``values`` onto the tail of the list ``name``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.lrange('rlist', 0, -1)
[b'lpushx', b'lset', b'two', b'after_2', b'tail']
>>> conn.rpush('rlist', 'push', 'push','push','push')
9
>>> conn.lrange('rlist', 0, -1)
[b'lpushx',
b'lset',
b'two',
b'after_2',
b'tail',
b'push',
b'push',
b'push',
b'push']
>>> conn.lpush('rlist', 'push', 'push','push','push')
13
>>> conn.lrange('rlist', 0, -1)
[b'push',
b'push',
b'push',
b'push',
b'lpushx',
b'lset',
b'two',
b'after_2',
b'tail',
b'push',
b'push',
b'push',
b'push']
lrem()
移除列表中指定数量的值。
删除列表尾部和头部指定的数据:
>>> conn.lrem?
Signature: conn.lrem(name, value, num=0)
Docstring:
Remove the first ``num`` occurrences of elements equal to ``value``
from the list stored at ``name``.
The ``num`` argument influences the operation in the following ways:
num > 0: Remove elements equal to value moving from head to tail.
num < 0: Remove elements equal to value moving from tail to head.
num = 0: Remove all elements equal to value.
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
# 删除头部的1个"push",头部剩余3个"push"
>>> conn.lrem('rlist', 'push', 1)
1
>>> conn.lrange('rlist', 0, -1)
[b'push',
b'push',
b'push',
b'lpushx',
b'lset',
b'two',
b'after_2',
b'tail',
b'push',
b'push',
b'push',
b'push']
# 删除头部的2个"push" ,头部剩余1个"push"
>>> conn.lrem('rlist', 'push', 2)
2
>>> conn.lrange('rlist', 0, -1)
[b'push',
b'lpushx',
b'lset',
b'two',
b'after_2',
b'tail',
b'push',
b'push',
b'push',
b'push']
# 删除尾部的3个"push" ,尾部剩余1个"push"
>>> conn.lrem('rlist', 'push', -3)
3
>>> conn.lrange('rlist', 0, -1)
[b'push', b'lpushx', b'lset', b'two', b'after_2', b'tail', b'push']
# 删除剩余的所有的"push",最后没有"push"
>>> conn.lrem('rlist', 'push', 0)
2
>>> conn.lrange('rlist', 0, -1)
[b'lpushx', b'lset', b'two', b'after_2', b'tail']
ltrim()
仅保留列表中指定范围内的值。
仅保留指定范围内的值:
>>> conn.ltrim?
Signature: conn.ltrim(name, start, end)
Docstring:
Trim the list ``name``, removing all values not within the slice
between ``start`` and ``end``
``start`` and ``end`` can be negative numbers just like
Python slicing notation
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
# 仅保留索引1和索引2处的值,'lpushx','after_2','tail'被删除
>>> conn.ltrim('rlist', 1, 2)
True
>>> conn.lrange('rlist', 0, -1)
[b'lset', b'two']
>>> conn.ltrim('rlist', 0, 0)
True
>>> conn.lrange('rlist', 0, -1)
[b'lset']
rpush()
朝列表( 列表必须存在 )尾部插入数据。
向列表尾部插入一个数据:
>>> conn.rpushx?
Signature: conn.rpushx(name, value)
Docstring: Push ``value`` onto the tail of the list ``name`` if ``name`` exists
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.rpushx('rlist', 'push')
2
>>> conn.lrange('rlist', 0, -1)
[b'lset', b'push']
rpop()
从列表尾部弹出数据,即删除列表尾部的数据。
从列表尾部弹出数据:
>>> conn.rpop?
Signature: conn.rpop(name)
Docstring: Remove and return the last item of the list ``name``
File: d:\programfiles\python362\lib\site-packages\redis\client.py
Type: method
>>> conn.rpop('rlist')
b'push'
>>> conn.rpop('rlist')
b'lset'
>>> conn.lrange('rlist', 0, -1)
[]
参考文献:
- sqlite3 — DB-API 2.0 interface for SQLite databases
- Welcome to PyMySQL’s documentation!
- SQLAlchemy 1.3 Documentation: Object Relational Tutorial
- SQLAlchemy 1.3 Documentation: Working with Engines and Connections
- SQLAlchemy 1.3 Documentation: Engine Configuration
- SQLAlchemy 1.3 Documentation: Database Urls
- SQLAlchemy 1.3 Documentation: Query API
- dataset: databases for lazy people
- dataset: databases for lazy people: API documentation
- dataset: databases for lazy people: Quickstart
- Python的”懒人”包DataSet解析
- Redis Quick Start
- Redis安装与卸载
- redis常用命令、常见错误、配置技巧等分享
- centos下部署redis服务环境及其配置说明
- Redis 命令参考
Selenium安装与使用¶
Python3.6.2下载¶
下载地址: https://www.python.org/downloads
python -V 查看python版本:
D:\Desktop>python -V
Python 3.6.2
selenium安装¶
使用pip安装selenium:
D:\Desktop>pip install selenium
Collecting selenium
Downloading https://files.pythonhosted.org/packages/5e/1f/6c2204b9ae14eddab615c5e2ee4956c65ed533e0a9986c23eabd801ae849/selenium-3.11.0-py2.py3-none-any.whl (943kB)
100% |████████████████████████████████| 952kB 644kB/s
Installing collected packages: selenium
Successfully installed selenium-3.11.0
Pycharm 下载¶
chromedriver驱动下载¶
chromedriver下载地址: http://chromedriver.storage.googleapis.com/index.html
下载完成后,将解压的chromedriver.exe文件复制到Python的安装目录中的Scripts文件夹中,如D:\Program Files (x86)\python3.6.2\Scripts 文件夹下。
chromedriver版本与chrome各版本如下表:
ChromeDriver版本 | 支持的Chrome版本 | 发布时间 |
---|---|---|
ChromeDriver v2.38 | v65-67 | 2018-04-17 |
ChromeDriver v2.37 | v64-66 | 2018-03-16 |
ChromeDriver v2.36 | v63-65 | 2018-03-02 |
ChromeDriver v2.35 | v62-64 | 2018-01-10 |
ChromeDriver v2.34 | v61-63 | 2017-12-10 |
ChromeDriver v2.33 | v60-62 | 2017-10-03 |
ChromeDriver v2.32 | v59-61 | 2017-08-30 |
ChromeDriver v2.31 | v58-60 | 2017-07-21 |
ChromeDriver v2.30 | v58-60 | 2017-06-07 |
ChromeDriver v2.29 | v56-58 | 2017-04-04 |
ChromeDriver v2.28 | v55-57 | 2017-03-09 |
ChromeDriver v2.27 | v54-56 | 2016-12-23 |
ChromeDriver v2.26 | v53-55 | 2016-12-09 |
用 Chrome 浏览器来测试:
#!/usr/bin/env python
#coding=utf-8 #解决编码问题
"""
@python version: python3.6.2
@author: 'meichaohui'
@contact: mzh.whut@gmail.com
@software: PyCharm
@filename: openweb.py
@time: 2017/10/5 22:35
@version: V1.0
"""
# 引入selenium测试工具中的webdriver模块
from selenium import webdriver
# 使用Chrome驱动
browser = webdriver.Chrome()
# 打开百度主页
print('窗口最大化')
browser.maximize_window()
browser.get('http://www.baidu.com/')
# 打印页面标题
print('当前页面标题',browser.title)
print('当前页面地址',browser.current_url)
browser.find_element_by_id("kw").send_keys("hopewait") # 查找到id为kw的输入框,并输入关键字
browser.find_element_by_id('su').click() #查找到id为su的按钮,并进行点击
print('退出!')
browser.quit()
运行后,控制台输出如下:
"D:\Program Files (x86)\python3.6.2\python.exe" D:/data/python_scripts/seleniumProjects/openweb.py
窗口最大化
当前页面标题 百度一下,你就知道
当前页面地址 https://www.baidu.com/
退出!
进程已结束,退出代码0
模块-UUID模块¶
背景知识¶
UUID: 通用唯一标识符 ( Universally Unique Identifier ),对于所有的UUID它可以保证在空间和时间上的唯一性。它是通过MAC地址,时间戳,命名空间,随机数,伪随机数来保证生成ID的唯一性,有着固定的大小( 128 bit )。 它的唯一性和一致性特点使得可以无需注册过程就能够产生一个新的UUID。UUID可以被用作多种用途,既可以用来短时间内标记一个对象,也可以可靠的辨别网络中的持久性对象。
为什么要使用UUID?¶
很多应用场景需要一个id,但是又不要求这个id 有具体的意义,仅仅用来标识一个对象. 常见的例子有数据库表的id 字段。另一个例子是前端的各种UI库,因为它们通常需要动态创建各种UI元素,这些元素需要唯一的id ,这时候就需要使用UUID了。
最近使用Flask-WTF构建web网站时,使用uuid生成一个随机的文件名,用于保存用户上传的图片。
UUID模块基本介绍¶
python的uuid模块提供UUID类和函数uuid1(), uuid3(), uuid4(), uuid5() 来生成1, 3, 4, 5各个版本的UUID。 ( 需要注意的是: python中没有uuid2()这个函数) 对UUID模块中最常用的几个函数总结如下:
- uuid.uuid1([node[, clock_seq]])
- 使用主机ID, 序列号, 和当前时间来生成UUID, 可保证全球范围的唯一性. 但由于使用该方法生成的UUID中包含有主机的网络地址, 因此 可能危及隐私 。
- uuid.uuid3(namespace, name)
- 通过计算命名空间和名字的MD5散列值来生成UUID, 可以保证同一命名空间中不同名字的唯一性和不同命名空间的唯一性, 但同一命名空间的同一名字生成的UUID相同。
- uuid.uuid4()
- 通过随机数来生成UUID. 使用的是伪随机数有一定的重复概率。
- uuid.uuid5(namespace, name)
- 通过计算命名空间和名字的SHA-1散列值来生成UUID, 算法与 uuid.uuid3()相同。
UUID可能概率重复吗?¶
参考https://bbs.csdn.net/topics/390045377 网站上MIceRice的说法,概率重复的可能性非常非常非常的小,可以忽略不计。
与被陨石击中的机率比较的话,已知一个人每年被陨石击中的机率估计为170亿分之1,也就是说机率大约是0.00000000006 (6 x 10 -11),等同于在一年内建立数十兆笔UUID并发生一次重复。换句话说,每秒产生10亿笔UUID,100年后只产生一次重复的机率是50%。如果地球上每个人都各有6亿笔UUID,发生一次重复的机率是50%。
UUID模块的使用¶
- UUID.bytes 将UUID实例转成16字节字符串,转换后成bytes类型。
- UUID.hex 将UUID实例转成32个字符的十六进制字符串,转换后成str类型。
- uuid.NAMESPACE_DNS 指定此命名空间时,名称字符串是完全限定的域名。
- uuid.NAMESPACE_URL 指定此命名空间时,名称字符串是URL。
使用示例:
In [1]: import uuid
In [2]: uuid.uuid1()
Out[2]: UUID('5618a8d0-e8e6-11e8-aff6-fddbb68775a4')
In [3]: uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org')
Out[3]: UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e')
In [4]: uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org')
Out[4]: UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')
In [5]: uuid.uuid4()
Out[5]: UUID('348e2893-2e43-45a1-ad25-fcc7bdce796a')
In [6]: uuid.uuid4().hex
Out[6]: '7e6d9aa5e3424455b4a6533b3094fb4f'
In [7]: uuid.uuid4().bytes
Out[7]: b'[a\\xda\\xafk\\xadEr\\x9d*\\rVq\\xb4\\xc1\\xea'
使用uuid产生随机文件名:
#!/usr/bin/python3
"""
@Author : 梅朝辉 Meizhaohui
@Email : mzh.whut@gmail.com
@Time : 2018/11/15 22:58
@File : use_uuid.py
@Version : 1.0
@Interpreter: Python3.6.2
@Software: PyCharm
@Description: 使用UUID模块生成随机文件名
"""
import os
import uuid
def random_filename(filename):
# 获取扩展名
extension = os.path.splitext(filename)[1]
# uuid.uuid4().hex生成32位的随机字符
# 生成随机文件名
new_filename = uuid.uuid4().hex + extension
return new_filename
if __name__ == '__main__':
filename = 'python.png'
print(random_filename(filename))
# output like: 606ab1d636b04e139d54ddaf8a177cfe.png
参考文献:
- Python_uuid 学习总结 https://www.cnblogs.com/lijingchn/p/5299000.html
- 官方文档 https://docs.python.org/3/library/uuid.html
- UUID会重复吗 https://bbs.csdn.net/topics/390045377
github设置免密上传文件¶
生成密钥¶
生成密钥:
ssh-keygen -t rsa -C "mzh.whut@gmail.com"
运行以上命令生成密钥,运行过程中一路按回车键(Enter)。命令执行完成后,会在家目录中生成.ssh目录,并生成两个文件:id_rsa(密钥)和id_rsa.pub(公钥),使用NotePad++打开公钥id_rsa.pub,将其中的内容复制。
将公钥保存到github中¶
- 登陆github后,依次打开 Settings-> SSH and GPG keys ,点击 New SSH key ,将刚才复制的公钥id_rsa.pub内容粘贴到Key输入框中,并指定一个Title标题,并点击Add SSH Key保存。
本地git环境配置¶
设置用户名和邮箱¶
设置用户名:
git config --global user.name "Zhaohui Mei"
设置邮箱:
git config --global user.email "mzh.whut@gmail.com"
查看用户名设置是否生效:
git config user.name
查看邮箱设置是否生效:
git config user.email
设置git凭证存储¶
如果你使用的是SSH方式连接远端,并且设置了一个没有口令的密钥,这样就可以在不输入用户名和密码的情况下安全地传输数据。 然而,这对 HTTP 协议来说是不可能的——每一个连接都是需要用户名和密码的。 “store”模式会将凭证用明文的形式存放在磁盘中,并且永不过期。 这意味着除非你修改了你在 Git服务器上的密码,否则你永远不需要再次输入你的凭证信息。
新建存储凭证数据的文件,在git bash窗口,使用以下命令新建凭证存储文件:
touch ~/.git-credentials
将用户名和密码数据存储到凭证文件中,在~/.git-credentials文件中添加以下数据: https://username:password@github.com
配置git凭证凭证存储:
git config --global credential.helper store
注:git配置文件存放路径为:~/.gitconfig
~/.git-credentials文件中,如果密码中包含特殊字符,需要进行urlEncode转义,如@符号需要写作%40,列举部分转换规则:
字符 | urlEncode |
---|---|
# | %23 |
$ | %24 |
%2b | |
@ | %40 |
: | %3a |
= | %3d |
? | %3f |
下载远程仓并测试修改上库¶
使用以下命令进行远程仓的下载:
git clone https://github.com/meizhaohui/hellopython.git
下载完成后,对下载的文件做一些修改,并保存
使用git diff查看修改差异:
git diff
将修改文件添加到本地库:
git add -A
添加commit信息:
git commit -m"commit log" (注:此处不要使用单引号包裹commit log信息)
查看远程仓信息:
git remote -v
查看本地分支信息:
git branch
将本地库中的修改push到远程仓库中:
git push origin master:master (注:第一个master为本地分支,第二个master为远程分支)
使用Faker生成虚拟数据¶
Faker简介¶
在软件需求、开发、测试过程中,有时候需要使用一些测试数据,针对这种情况,我们一般要么使用已有的系统数据,要么需要手动制造一些数据。在手动制造数据的过程中,可能需要花费大量精力和工作量,而使用faker生成虚拟数据可以为我们减少这部分的工作量。
引用与初始化¶
引用包:
from faker import Faker
实例化:
fake = Faker() # use default locale en_US
or:
fake = Faker(locale='zh_CN') # 使用中文
Faker方法:
>>> fake.
fake.add_provider( fake.first_name_male( fake.pydict(
fake.address( fake.first_romanized_name( fake.pyfloat(
fake.am_pm( fake.format( fake.pyint(
fake.ascii_company_email( fake.free_email( fake.pyiterable(
fake.ascii_email( fake.free_email_domain( fake.pylist(
fake.ascii_free_email( fake.future_date( fake.pyset(
fake.ascii_safe_email( fake.future_datetime( fake.pystr(
fake.bank_country( fake.get_formatter( fake.pystruct(
fake.bban( fake.get_providers( fake.pytuple(
fake.binary( fake.hex_color( fake.random
fake.boolean( fake.hexify( fake.random_choices(
fake.bothify( fake.hostname( fake.random_digit(
fake.bs( fake.iban( fake.random_digit_not_null(
fake.building_number( fake.image_url( fake.random_digit_not_null_or_empty(
fake.catch_phrase( fake.internet_explorer( fake.random_digit_or_empty(
fake.century( fake.ipv4( fake.random_element(
fake.chrome( fake.ipv4_network_class( fake.random_elements(
fake.city( fake.ipv4_private( fake.random_int(
fake.city_name( fake.ipv4_public( fake.random_letter(
fake.city_suffix( fake.ipv6( fake.random_letters(
fake.color_name( fake.isbn10( fake.random_lowercase_letter(
fake.company( fake.isbn13( fake.random_number(
fake.company_email( fake.iso8601( fake.random_sample(
fake.company_prefix( fake.job( fake.random_uppercase_letter(
fake.company_suffix( fake.language_code( fake.randomize_nb_elements(
fake.coordinate( fake.last_name( fake.rgb_color(
fake.country( fake.last_name_female( fake.rgb_css_color(
fake.country_code( fake.last_name_male( fake.romanized_name(
fake.credit_card_expire( fake.last_romanized_name( fake.safari(
fake.credit_card_full( fake.latitude( fake.safe_color_name(
fake.credit_card_number( fake.latlng( fake.safe_email(
fake.credit_card_provider( fake.lexify( fake.safe_hex_color(
fake.credit_card_security_code( fake.license_plate( fake.seed(
fake.cryptocurrency( fake.linux_platform_token( fake.seed_instance(
fake.cryptocurrency_code( fake.linux_processor( fake.sentence(
fake.cryptocurrency_name( fake.local_latlng( fake.sentences(
fake.currency( fake.locale( fake.set_formatter(
fake.currency_code( fake.location_on_land( fake.sha1(
fake.currency_name( fake.longitude( fake.sha256(
fake.date( fake.mac_address( fake.simple_profile(
fake.date_between( fake.mac_platform_token( fake.slug(
fake.date_between_dates( fake.mac_processor( fake.ssn(
fake.date_object( fake.md5( fake.street_address(
fake.date_of_birth( fake.mime_type( fake.street_name(
fake.date_this_century( fake.month( fake.street_suffix(
fake.date_this_decade( fake.month_name( fake.suffix(
fake.date_this_month( fake.msisdn( fake.suffix_female(
fake.date_this_year( fake.name( fake.suffix_male(
fake.date_time( fake.name_female( fake.text(
fake.date_time_ad( fake.name_male( fake.time(
fake.date_time_between( fake.null_boolean( fake.time_delta(
fake.date_time_between_dates( fake.numerify( fake.time_object(
fake.date_time_this_century( fake.opera( fake.time_series(
fake.date_time_this_decade( fake.paragraph( fake.timezone(
fake.date_time_this_month( fake.paragraphs( fake.tld(
fake.date_time_this_year( fake.parse( fake.unix_device(
fake.day_of_month( fake.password( fake.unix_partition(
fake.day_of_week( fake.past_date( fake.unix_time(
fake.district( fake.past_datetime( fake.uri(
fake.domain_name( fake.phone_number( fake.uri_extension(
fake.domain_word( fake.phonenumber_prefix( fake.uri_page(
fake.ean( fake.postcode( fake.uri_path(
fake.ean13( fake.prefix( fake.url(
fake.ean8( fake.prefix_female( fake.user_agent(
fake.email( fake.prefix_male( fake.user_name(
fake.file_extension( fake.profile( fake.uuid4(
fake.file_name( fake.provider( fake.windows_platform_token(
fake.file_path( fake.providers fake.word(
fake.firefox( fake.province( fake.words(
fake.first_name( fake.pybool( fake.year(
fake.first_name_female( fake.pydecimal(
常用方法¶
以下列出一些常用方法:
>>> fake.name() # 生成姓名
'田鑫'
>>> fake.address() # 生成地址
'山东省阳市高坪吕路A座 998657'
>>> fake.country() # 国家
'意大利'
>>> fake.province() # 省份
'安徽省'
>>> fake.city() # 城市
'哈尔滨市'
>>> fake.district() # 区
'徐汇'
>>> fake.street_address() # 街道
'海门路I座'
>>> fake.random_int() # 随机数字,默认0~9999
2257
>>> fake.random_digit() # 0~9随机数
6
>>> fake.random_number() # 随机数字,参数digits设置生成的数字位数,返回random.randint(0, pow(10, digits) - 1)
3229
>>> fake.random_letter() # 随机字母
'Q'
>>> fake.random_lowercase_letter() # 随机小写字母
'z'
>>> fake.random_uppercase_letter() # 随机大写字母
'V'
>>> fake.color_name() # 颜色名
'GoldenRod'
>>> fake.color_name()
'Chartreuse'
>>> fake.color_name()
'DeepPink'
>>> fake.color_name()
'MediumSpringGreen'
>>> fake.company() # 随机公司名
'联通时科网络有限公司'
>>> fake.bs() # 随机公司服务名
'mesh bleeding-edge infrastructures'
>>> fake.company_suffix() # 随机公司性质
'信息有限公司'
>>> fake.credit_card_number() # 信用卡号
'4803099375057291529'
>>> fake.credit_card_provider() # 信用卡类型
'VISA 19 digit'
>>> fake.currency_code() # 货币代码
'EUR'
>>> fake.am_pm() # AM/PM
'AM'
>>> fake.date() # 日期
'1974-08-12'
>>> fake.date_this_year() # 今年的随机日期
datetime.date(2018, 5, 6)
>>> fake.date_this_month() # 这个月的随机日期
datetime.date(2018, 11, 17)
>>> fake.month() # 随机月份数字
'09'
>>> fake.month_name() # 随机月份名称
'July'
>>> fake.date_time_this_year() # 今年的某个时间
datetime.datetime(2018, 7, 21, 7, 43, 58)
>>> fake.date_time() # 随机时间
datetime.datetime(2007, 9, 13, 14, 15, 54)
>>> fake.time() # 随机24小时时间,time对象
'23:28:47'
>>> fake.file_name() # 文件名
'更新.html'
>>> fake.file_path() # 文件路径
'/的话/系列.docx'
>>> fake.file_extension() # 文件扩展
'xlsx'
>>> fake.mime_type() # 随机mime类型
'video/ogg'
>>> fake.ascii_company_email() # 随机公司邮箱
'guiying74@xiajun.net'
>>> fake.ascii_email() # 随机邮箱
'tangyan@gmail.com'
>>> fake.ipv4() # 随机IP4地址
'126.162.176.179'
>>> fake.ipv6() # 随机IP6地址
'9be4:c8c9:f589:f14b:24e6:2425:88c:bef9'
>>> fake.mac_address() # 随机MAC地址
'7e:51:97:aa:8b:a1'
>>> fake.url() # 随机URI地址
'http://luo.cn/'
>>> fake.job() # 随机职位
'网络工程师'
>>> fake.paragraph() # 段落
'准备帮助标题论坛.朋友开始类型网上这种.日本其他然后城市.'
>>> fake.sentence() # 随机一句话
'产品应用操作详细.'
>>> fake.word() # 单词
'参加'
>>> fake.boolean() # 随机布尔值
False
>>> fake.phone_number() # 随机手机号
'18071087230'
>>> fake.profile() # 随机档案
{'job': '银行柜员', 'company': '四通科技有限公司', 'ssn': '220200194905157548', 'residence': '山西省长沙市城东童街R座 486365', 'current_location': (Decimal('61.941104'), Decimal('-177.651444')), 'blood_group': 'A+', 'website': ['https://gong.cn/', 'https://www.xiuyingna.org/', 'http://xp.cn/', 'http://www.wei.org/'], 'username': 'wei07', 'name': '廉雪梅', 'sex': 'F', 'address': '山西省金凤市上街公路M座 409920', 'mail': 'taotian@gmail.com', 'birthdate': datetime.date(1911, 12, 12)}
>>> fake.ssn() # 身份证号
'510726199311249157'
>>> fake.firefox() # 随机生成FireFox的浏览器user_agent信息
'Mozilla/5.0 (X11; Linux x86_64; rv:1.9.6.20) Gecko/2013-12-19 08:38:18 Firefox/13.0'
>>> fake.user_agent() # 随机user_agent信息
'Mozilla/5.0 (iPod; U; CPU iPhone OS 4_1 like Mac OS X; ca-AD) AppleWebKit/531.14.4 (KHTML, like Gecko) Version/3.0.5 Mobile/8B113 Safari/6531.14.4'
随机密码¶
生成随机密码:
>>> fake.password() # 随机密码
's_3XwfSitx'
源代码:
def password(
self,
length=10,
special_chars=True,
digits=True,
upper_case=True,
lower_case=True):
"""
Generates a random password.
@param length: Integer. Length of a password
@param special_chars: Boolean. Whether to use special characters !@#$%^&*()_+
@param digits: Boolean. Whether to use digits
@param upper_case: Boolean. Whether to use upper letters
@param lower_case: Boolean. Whether to use lower letters
@return: String. Random password
"""
- 生成随机密码时,密码生成长度为10位,可以使用特殊字符、数字、大写字母、小写字母的密码。
使用Faker生成虚拟数据库数据¶
如参考 SayHello 项目中构建flask命令行工具,生成虚拟数据库数据的例子,代码如下:
#!/usr/bin/python3
"""
@Author : Zhaohui Mei(梅朝辉)
@Email : mzh.whut@gmail.com
@Time : 2018/11/17 9:01
@File : commands.py
@Version : 1.0
@Interpreter: Python3.6.2
@Software: PyCharm
@Description: 自定义Flask命令
"""
"""
使用说明:
在cmder命令行切换到commands.py所在的目录,然后设置FLASK_APP=commands
$ set FLASK_APP=commands
$ flask init 初始化
$ flask forge --count=50 生成50个虚拟数据
"""
import click
from sayhello import app, db
from sayhello.models import Message
@app.cli.command()
def initdb():
# 新建数据表
db.create_all()
click.echo('Initialized database.')
@app.cli.command()
@click.option('--count', default=20, help='Quantity of messages,default is 20.')
def forge(count):
"""Generate fake messages"""
from faker import Faker
db.drop_all()
db.create_all()
fake = Faker() # 创建用来生成虚拟数据的Faker实例
click.echo('Working...')
for i in range(count):
message = Message(
name=fake.name(),
body=fake.sentence(),
timestamp=fake.date_time_this_year()
)
db.session.add(message)
db.session.commit()
click.echo(f'Created {count} fake messages!')
使用flask命令行创建虚拟数据:
$ flask forge --count=10
Working...
Created 10 fake messages!
查询MYSQL数据库的数据如下图:

如果初始化fake时指定locale,如:
fake = Faker(locale='zh_CN')
则显示如下:

注: 不同语种的可用方法可能不同,在使用过程中请参考官网说明。
这样可以快速的添加多条虚拟数据,提高开发效率。
Faker地址: https://github.com/joke2k/faker
SayHello项目地址: http://github.com/greyli/sayhello
Pipenv虚拟环境的使用¶
目录
pipenv 是Kenneth Reitz大神的作品,提供Python的各个版本间的管理,各种包管理。是virtualenv pip等工具的合体。
Pipenv的优点¶
- 自动关联项目相关的 virtualenv,能够快速的加载 virtualenv 。
- 提供的pipenv替代pip并自带一个依赖清单Pipfile,和依赖锁定Pipfile.lock。
- Pipfile除了依赖清单还支持固定pypi源地址,固定python版本。
- Pipfile还支持dev依赖清单.pipenv install的包会强制使用Pipfile中的源.
- 使用pipenv graph命令可以看到依赖树。
- 可以直接切换python2,3。
- 可通过自动加载 .env 读取环境变量,简化开发流程。
Pipenv的安装¶
本文使用Python3.6.2作为测试环境。 Python3.6.2安装文件的下载地址如下:https://www.python.org/downloads/release/python-362/
安装后会自动安装pip,请提前修改pip源地址。
使用pip安装Pipenv:
$ pip install pipenv
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting pipenv
Downloading http://mirrors.aliyun.com/pypi/packages/13/b4/3ffa55f77161cff9a5220f162670f7c5eb00df52e00939e203f601b0f579/pipenv-2018.11.26-py3-none-any.whl (5.2MB)
100% |████████████████████████████████| 5.2MB 8.3MB/s
Requirement already satisfied: certifi in d:\program files (x86)\python3.6.2\lib\site-packages (from pipenv) (2018.1.18)
Requirement already satisfied: setuptools>=36.2.1 in d:\program files (x86)\python3.6.2\lib\site-packages (from pipenv) (40.6.2)
Requirement already satisfied: virtualenv in d:\program files (x86)\python3.6.2\lib\site-packages (from pipenv) (16.0.0)
Requirement already satisfied: virtualenv-clone>=0.2.5 in d:\program files (x86)\python3.6.2\lib\site-packages (from pipenv) (0.4.0)
Requirement already satisfied: pip>=9.0.1 in d:\program files (x86)\python3.6.2\lib\site-packages (from pipenv) (18.1)
Installing collected packages: pipenv
Successfully installed pipenv-2018.11.26
Pipenv环境变量配置¶
在环境变量中配置变量 PIPENV_VENV_IN_PROJECT,pipenv会在当前目录下创建.venv的目录,以后都会把模块装到这个.venv下:
PIPENV_VENV_IN_PROJECT=1
设置 PIPENV_PYPI_MIRROR,配置pypi源地址(检查发现此种方式不起作用):
PIPENV_PYPI_MIRROR=https://mirrors.aliyun.com/pypi/simple
设置 PIPENV_TEST_INDEX,配置pypi源地址:
PIPENV_TEST_INDEX=https://mirrors.aliyun.com/pypi/simple
如果后面发现PIPENV_TEST_INDEX未起作用,修改Pipenv源码文件python3.6.2\Lib\site-packages\pipenv\project.py的127行,将u”https://pypi.org/simple”改成u”https://mirrors.aliyun.com/pypi/simple”。
创建Pipenv虚拟环境¶
切换到项目目录下,并创建虚拟环境:
$ mkdir myproject
D:\data
$ cd myproject\
D:\data\myproject
$ ls
D:\data\myproject
$ pipenv install
Creating a virtualenv for this project…
Pipfile: D:\data\myproject\Pipfile
Using d:\program files (x86)\python3.6.2\python.exe (3.6.2) to create virtualenv…
[ ==] Creating virtual environment...Already using interpreter d:\program files (x86)\python3.6.2\python.exe
Using base prefix 'd:\\program files (x86)\\python3.6.2'
New python executable in D:\data\myproject\.venv\Scripts\python.exe
Installing setuptools, pip, wheel...done.
Successfully created virtual environment!
Virtualenv location: D:\data\myproject\.venv
Creating a Pipfile for this project…
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Updated Pipfile.lock (ca72e7)!
Installing dependencies from Pipfile.lock (ca72e7)…
================================ 0/0 - 00:00:00
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
D:\data\myproject
$
初始化虚拟环境后,会在项目目录下生成Pipfile和Pipfile.lock,以及目录.venv。如下图所示:

Pipfile和Pipfile.lock为pipenv包的配置文件,代替原来的 requirement.txt。
项目提交时,可将Pipfile 文件和Pipfile.lock文件一并提交,待其他开发克隆下载,根据此Pipfile运行命令pipenv install –dev生成自己的虚拟环境。
通过pipenv install初始化虚拟环境时,Pipenv会查找本地安装的Python版本,作为Pipenv虚拟环境的基础,并仅安装setuptools, pip, wheel三个包。
在Virtualenv中执行命令¶
通过pipenv run command 可查在Virtualenv虚拟环境中执行命令,如下使用pipenv run pip list查看安装的包:
$ pipenv run pip list
Loading .env environment variables…
Package Version
---------- -------
pip 18.1
setuptools 40.6.2
wheel 0.32.3
安装包¶
使用pipenv install package_name 安装Python包:
$ pipenv install flask
Installing flask…
Adding flask to Pipfile's [packages]…
Installation Succeeded
Pipfile.lock (4a5fad) out of date, updating to (a8f5d4)…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Success!
Updated Pipfile.lock (4a5fad)!
Installing dependencies from Pipfile.lock (4a5fad)…
================================ 6/6 - 00:00:01
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
此时再查看安装的包的情况:
$ pipenv run pip list
Loading .env environment variables…
Package Version
------------ -------
Click 7.0
Flask 1.0.2
itsdangerous 1.1.0
Jinja2 2.10
MarkupSafe 1.1.0
pip 18.1
setuptools 40.6.2
Werkzeug 0.14.1
wheel 0.32.3
以上命令只能查看到安装的包的情况,但并不知道包之间的依赖关系。可以使用pipenv graph查看包的依赖关系。
查看安装的包和依赖关系¶
使用pipenv graph查看包的依赖关系:
$ pipenv graph
Flask==1.0.2
- click [required: >=5.1, installed: 7.0]
- itsdangerous [required: >=0.24, installed: 1.1.0]
- Jinja2 [required: >=2.10, installed: 2.10]
- MarkupSafe [required: >=0.23, installed: 1.1.0]
- Werkzeug [required: >=0.14, installed: 0.14.1]
将包导出到requirement.txt文件¶
使用pipenv lock -r > requirements.txt 命令依赖包导出到文件:
$ pipenv lock -r > requirements.txt
$ cat requirements.txt
-i https://mirrors.aliyun.com/pypi/simple/
click==7.0
flask==1.0.2
itsdangerous==1.1.0
jinja2==2.10
markupsafe==1.1.0
werkzeug==0.14.1
通过requirements.txt安装包¶
可以将requirements.txt给别人,别人通过requirements.txt安装包:
$ mkdir ..\my_new_project
$ cp requirements.txt ..\my_new_project\
$ cd ..\my_new_project\
$ pipenv install -r requirements.txt
Creating a virtualenv for this project…
Pipfile: D:\data\my_new_project\Pipfile
Using d:\program files (x86)\python3.6.2\python.exe (3.6.2) to create virtualenv…
[ ] Creating virtual environment...Already using interpreter d:\program files (x86)\python3.6.2\python.exe
Using base prefix 'd:\\program files (x86)\\python3.6.2'
New python executable in D:\data\my_new_project\.venv\Scripts\python.exe
Installing setuptools, pip, wheel...done.
Successfully created virtual environment!
Virtualenv location: D:\data\my_new_project\.venv
Creating a Pipfile for this project…
Requirements file provided! Importing into Pipfile…
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Success!
Updated Pipfile.lock (4c2105)!
Installing dependencies from Pipfile.lock (4c2105)…
================================ 6/6 - 00:00:02
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
检查新项目中的包的安装情况:
$ pipenv run pip list
Package Version
------------ -------
Click 7.0
Flask 1.0.2
itsdangerous 1.1.0
Jinja2 2.10
MarkupSafe 1.1.0
pip 18.1
setuptools 40.6.2
Werkzeug 0.14.1
wheel 0.32.3
可以发现与原来项目中的包是一样的。
卸载包¶
通过pipenv uninstall package_name 卸载包:
$ pipenv uninstall flask
Uninstalling flask…
Uninstalling Flask-1.0.2:
Successfully uninstalled Flask-1.0.2
Removing flask from Pipfile…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Success!
Updated Pipfile.lock (48af14)!
在Pipenv Shell环境下工作¶
使用pipenv shell启动shell环境:
$ pipenv shell
Launching subshell in virtual environment…
删除卸载环境¶
使用pipenv –rm 删除虚拟环境:
$ pipenv --rm
Removing virtualenv (D:\data\my_new_project\.venv)…
注意: 删除虚拟环境后,只是删除了.venv目录,但项目下面的Pipfile和Pipfile.lock并没有被删除。
指定Python路径安装虚拟环境¶
假如我想安装Python3.7的虚拟环境,尝试去初始化:
$ pipenv --python 3.7
Warning: Python 3.7 was not found on your system…
You can specify specific versions of Python with:
$ pipenv --python path\to\python
说明我电脑系统中没有Python3.7,我可以通过指定Python的路径来初始化虚拟环境,这在linux系统中非root用户不想使用系统默认的Python环境时非常有用。
指定Python路径安装虚拟环境:
$ pipenv --python "D:\Program Files (x86)\python3.6.2\python.exe"
Creating a virtualenv for this project…
Pipfile: D:\data\my_newpro\Pipfile
Using D:\Program Files (x86)\python3.6.2\python.exe (3.6.2) to create virtualenv…
[== ] Creating virtual environment...Using base prefix 'D:\\Program Files (x86)\\python3.6.2'
New python executable in D:\data\my_newpro\.venv\Scripts\python.exe
Installing setuptools, pip, wheel...done.
Running virtualenv with interpreter D:\Program Files (x86)\python3.6.2\python.exe
Successfully created virtual environment!
Virtualenv location: D:\data\my_newpro\.venv
Creating a Pipfile for this project…
Pipenv的帮助文档¶
使用pipenv -h可以查看Pipenv的帮助文档信息:
$ pipenv -h
Usage: pipenv [OPTIONS] COMMAND [ARGS]...
Options:
--where Output project home information. # 项目目录信息
--venv Output virtualenv information. # 输出 virtualenv 的目录信息
--py Output Python interpreter information. # 输出 Python 解析器的路径
--envs Output Environment Variable options. # 输出可设置的环境变量
--rm Remove the virtualenv. # 删除虚拟环境
--bare Minimal output.
--completion Output completion (to be eval'd).
--man Display manpage.
--support Output diagnostic information for use in GitHub issues.
--site-packages Enable site-packages for the virtualenv. [env var:
PIPENV_SITE_PACKAGES]
--python TEXT Specify which version of Python virtualenv should use.
--three / --two Use Python 3/2 when creating virtualenv.
--clear Clears caches (pipenv, pip, and pip-tools). [env var:
PIPENV_CLEAR]
-v, --verbose Verbose mode.
--pypi-mirror TEXT Specify a PyPI mirror. # 指定PyPI源
--version Show the version and exit. # 显示Pipenv的版本
-h, --help Show this message and exit.
Usage Examples:
Create a new project using Python 3.7, specifically:
$ pipenv --python 3.7
Remove project virtualenv (inferred from current directory):
$ pipenv --rm
Install all dependencies for a project (including dev):
$ pipenv install --dev
Create a lockfile containing pre-releases:
$ pipenv lock --pre
Show a graph of your installed dependencies:
$ pipenv graph
Check your installed dependencies for security vulnerabilities:
$ pipenv check
Install a local setup.py into your virtual environment/Pipfile:
$ pipenv install -e .
Use a lower-level pip command:
$ pipenv run pip freeze
Commands:
check Checks for security vulnerabilities and against PEP 508 markers
provided in Pipfile. # 检查安全漏洞
clean Uninstalls all packages not specified in Pipfile.lock.
graph Displays currently-installed dependency graph information. # 显示当前依赖关系图信息
install Installs provided packages and adds them to Pipfile, or (if no
packages are given), installs all packages from Pipfile. # 安装包
lock Generates Pipfile.lock. # 生成Pipfile.lock
open View a given module in your editor. # 在编辑器中查看一个特定模块
run Spawns a command installed into the virtualenv. # 在 virtualenv 中执行命令
shell Spawns a shell within the virtualenv. # 进入到虚拟Shell环境
sync Installs all packages specified in Pipfile.lock.
uninstall Un-installs a provided package and removes it from Pipfile. # 卸载包
update Runs lock, then sync. # 卸载当前所以依赖,然后安装最新包
Pipenv自动加载配置文件¶
如果在项目目录中存在.env文件,那么在pipenv shell或pipenv run中都会自动加载.env文件。这对于保存一些敏感信息非常重要。
将敏感信息保存到.env文件中,不使用硬代码写入到项目中:
$ cat .env
MAIL_USERNAME=mzh.whut@gmail.com
MAIL_PASSWORD=123456
SECRET_KEY=nobody know this
D:\data\myproject
$ pipenv shell
Loading .env environment variables…
Launching subshell in virtual environment…
$ python
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ.get('MAIL_USERNAME')
'mzh.whut@gmail.com'
>>> os.environ.get('MAIL_PASSWORD')
'123456'
>>> os.environ.get('SECRET_KEY')
'nobody know this'
在Flask中加载.env配置文件¶
示例文件如下:
#!/usr/bin/python3
"""
@Author : Zhaohui Mei(梅朝辉)
@Email : mzh.whut@gmail.com
@Time : 2018/11/27 23:32
@File : myweb.py
@Version : 1.0
@Interpreter: Python3.6.2
@Software: PyCharm
@Description: 测试使用.env文件加载配置
"""
import os
from flask import Flask
# 创建类的实例,是一个WSGI应用程序
app = Flask(__name__)
@app.route('/')
def index():
MAIL_USERNAME = os.environ.get('MAIL_USERNAME')
MAIL_PASSWORD = os.environ.get('MAIL_PASSWORD')
return f'用户名:{MAIL_USERNAME},密码:{MAIL_PASSWORD}'
if __name__ == '__main__':
# run()函数让应用运行在本地服务器上
app.run(debug=True)
直接运行,在命令行显示结果如下:
D:\data\myproject\.venv\Scripts\python.exe D:/data/myproject/myweb.py
* Tip: There are .env files present. Do "pip install python-dotenv" to use them.
* Serving Flask app "myweb" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: on
* Restarting with stat
* Tip: There are .env files present. Do "pip install python-dotenv" to use them.
* Debugger is active!
* Debugger PIN: 174-500-507
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [27/Nov/2018 23:35:13] "GET / HTTP/1.1" 200 -
此时查看 http://127.0.0.1:5000/ ,结果如下图所示:

可知Flask并没有获取到相应的配置数据,需要安装python-dotenv,在虚拟环境中安装:
$ pipenv install python-dotenv
Installing python-dotenv…
Adding python-dotenv to Pipfile's [packages]…
Installation Succeeded
Pipfile.lock (d90202) out of date, updating to (4a5fad)…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Success!
Updated Pipfile.lock (d90202)!
Installing dependencies from Pipfile.lock (d90202)…
================================ 7/7 - 00:00:02
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
$ pipenv run pip list
Loading .env environment variables…
Package Version
------------- -------
Click 7.0
Flask 1.0.2
itsdangerous 1.1.0
Jinja2 2.10
MarkupSafe 1.1.0
pip 18.1
python-dotenv 0.9.1
setuptools 40.6.2
Werkzeug 0.14.1
wheel 0.32.3
安装完成python-dotenv后,再重新运行Flask项目,重新访问 http://127.0.0.1:5000/ ,结果如下图所示:

说明.env配置数据已经成功解析。
注意:当将项目上传到github代码仓库时,请忽略掉.env文件,即将.env加入到.gitignore文件列表中
参考文献:
- Python包和版本管理的最好工具—-pipenv http://www.mamicode.com/info-detail-2214218.html?tdsourcetag=s_pcqq_aiomsg
- pipenv使用 https://www.jianshu.com/p/d06684101a3d?tdsourcetag=s_pcqq_aiomsg
- pipenv的高级用法 https://www.jianshu.com/p/8c6ae288ba48
- Advanced Usage of Pipenv https://pipenv.readthedocs.io/en/latest/advanced/
- PyPI中Pipenv的说明 https://pypi.org/project/pipenv/
- Pipenv源码 https://github.com/pypa/pipenv
ReviewBoard国际化配置¶
目录
ReviewBoard是一个代码评审系统,当前默认没有配置中文翻译,本节主要讲解如何将ReviewBoard系统汉化。
基本信息¶
查看Django版本:
[root@helloreview ~]# pip list|grep Django
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
Django 1.6.11
Django安装目录:
[root@helloreview ~]# ls -lah /usr/lib/python2.7/site-packages/django
total 36K
drwxr-xr-x 17 root root 255 Aug 27 11:01 .
drwxr-xr-x. 95 root root 8.0K Sep 5 20:23 ..
drwxr-xr-x 3 root root 217 Aug 27 11:01 bin
drwxr-xr-x 6 root root 168 Sep 5 22:38 conf
drwxr-xr-x 19 root root 317 Aug 27 11:01 contrib
drwxr-xr-x 10 root root 4.0K Aug 27 11:01 core
drwxr-xr-x 4 root root 153 Aug 27 11:01 db
drwxr-xr-x 2 root root 125 Aug 27 11:01 dispatch
drwxr-xr-x 3 root root 269 Aug 27 11:01 forms
drwxr-xr-x 2 root root 242 Aug 27 11:01 http
-rw-r--r-- 1 root root 270 Aug 27 11:00 __init__.py
-rw-r--r-- 1 root root 465 Aug 27 11:01 __init__.pyc
drwxr-xr-x 2 root root 4.0K Sep 1 09:52 middleware
drwxr-xr-x 2 root root 45 Aug 27 11:01 shortcuts
drwxr-xr-x 3 root root 4.0K Aug 27 11:01 template
drwxr-xr-x 2 root root 237 Aug 27 11:01 templatetags
drwxr-xr-x 2 root root 331 Aug 27 11:01 test
drwxr-xr-x 5 root root 4.0K Aug 27 11:01 utils
drwxr-xr-x 4 root root 247 Aug 27 11:01 views
查看ReviewBoard版本:
[root@helloreview ~]# pip list|grep -i Review
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. More details about Python 2 support in pip, can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support
ReviewBoard 3.0.12
ReviewBoard相关目录:
[root@helloreview ~]# ls -lah /usr/lib64/python2.7/site-packages/reviewboard
total 240K
drwxr-xr-x 29 root root 4.0K Sep 6 13:00 .
drwxr-xr-x. 33 root root 4.0K Sep 6 14:00 ..
drwxr-xr-x 8 root root 4.0K Sep 2 19:58 accounts
drwxr-xr-x 5 root root 4.0K Sep 5 23:37 admin
drwxr-xr-x 5 root root 4.0K Aug 27 11:01 attachments
drwxr-xr-x 3 root root 254 Sep 5 23:41 avatars
drwxr-xr-x 3 root root 164 Aug 27 11:01 changedescs
drwxr-xr-x 3 root root 158 Aug 27 11:01 cmdline
drwxr-xr-x 2 root root 298 Sep 2 19:59 datagrids
-rw-r--r-- 1 root root 5.2K Aug 27 10:59 dependencies.py
-rw-r--r-- 1 root root 5.3K Aug 27 11:01 dependencies.pyc
-rw-r--r-- 1 root root 1.2K Aug 27 10:59 deprecation.py
-rw-r--r-- 1 root root 1.6K Aug 27 11:01 deprecation.pyc
drwxr-xr-x 7 root root 4.0K Sep 4 21:24 diffviewer
drwxr-xr-x 6 root root 281 Aug 27 11:01 extensions
drwxr-xr-x 2 root root 117 Aug 27 11:01 features
drwxr-xr-x 7 root root 4.0K Aug 27 11:01 hostingsvcs
drwxr-xr-x 4 root root 33 Aug 27 11:01 htdocs
-rw-r--r-- 1 root root 5.0K Aug 27 10:59 __init__.py
-rw-r--r-- 1 root root 4.4K Aug 27 11:01 __init__.pyc
drwxr-xr-x 3 root root 254 Aug 27 11:01 integrations
drwxr-xr-x 9 root root 91 Sep 1 10:17 locale
-rw-r--r-- 1 root root 12K Aug 27 10:59 manage.py
-rw-r--r-- 1 root root 8.5K Aug 27 11:01 manage.pyc
-rw-r--r-- 1 root root 179 Aug 27 10:59 nose.cfg
drwxr-xr-x 6 root root 288 Aug 27 11:01 notifications
drwxr-xr-x 3 root root 4.0K Aug 27 11:01 oauth
-rw-r--r-- 1 root root 949 Aug 27 10:59 rb_platform.py
-rw-r--r-- 1 root root 635 Aug 27 11:01 rb_platform.pyc
drwxr-xr-x 2 root root 117 Aug 27 11:01 registries
drwxr-xr-x 2 apache apache 29 Aug 30 21:52 reviewboardlog
drwxr-xr-x 8 root root 4.0K Sep 5 22:35 reviews
drwxr-xr-x 9 root root 4.0K Aug 27 11:01 scmtools
drwxr-xr-x 3 root root 4.0K Aug 27 11:01 search
-rw-r--r-- 1 root root 17K Sep 5 22:09 settings.py
-rw-r--r-- 1 root root 2.0K Aug 27 10:59 signals.py
-rw-r--r-- 1 root root 1.8K Aug 27 11:01 signals.pyc
drwxr-xr-x 5 root root 4.0K Aug 30 22:33 site
drwxr-xr-x 2 root root 253 Aug 27 11:01 ssh
drwxr-xr-x 4 root root 27 Aug 27 11:01 static
-rw-r--r-- 1 root root 23K Aug 27 10:59 staticbundles.py
-rw-r--r-- 1 root root 34K Aug 27 11:01 staticbundles.pyc
drwxr-xr-x 17 root root 326 Aug 27 11:01 templates
drwxr-xr-x 2 root root 176 Aug 27 11:01 testing
-rw-r--r-- 1 root root 1.7K Aug 27 10:59 test.py
-rw-r--r-- 1 root root 2.3K Aug 27 11:01 test.pyc
-rw-r--r-- 1 root root 1.7K Aug 27 10:59 tests.py
-rw-r--r-- 1 root root 2.3K Aug 27 11:01 tests.pyc
-rw-r--r-- 1 root root 3.9K Aug 27 10:59 urls.py
-rw-r--r-- 1 root root 3.5K Aug 27 11:01 urls.pyc
drwxr-xr-x 5 root root 4.0K Sep 4 23:27 webapi
ReviewBoard静态文件目录:
[root@helloreview ~]# ls -lah /var/www/html/reviewboard/
total 0
drwxr-xr-x 7 apache apache 67 Sep 1 20:28 .
drwxr-xr-x. 3 root root 34 Sep 6 13:15 ..
drwxr-xr-x 2 apache apache 98 Sep 5 21:45 conf
drwxr-xr-x 3 apache apache 25 Aug 27 11:52 data
drwxr-xr-x 5 apache apache 74 Aug 31 19:52 htdocs
drwxr-xr-x 2 apache apache 6 Aug 27 11:28 logs
drwxrwxrwx 2 apache apache 6 Aug 27 11:28 tmp
配置文件¶
- Django默认配置文件:
/usr/lib/python2.7/site-packages/django/conf/global_settings.py
- ReviewBoard的Django项目配置文件:
/usr/lib64/python2.7/site-packages/reviewboard/settings.py
- ReviewBoard网站配置文件:
/var/www/html/reviewboard/conf/settings_local.py
Django关于语言和时区的配置:
[root@hellolinux ~]# cat -n /usr/lib/python2.7/site-packages/django/conf/global_settings.py|sed -n '36,144p'
36 # Local time zone for this installation. All choices can be found here:
37 # http://en.wikipedia.org/wiki/List_of_tz_zones_by_name (although not all
38 # systems may support all possibilities). When USE_TZ is True, this is
39 # interpreted as the default user time zone.
40 TIME_ZONE = 'America/Chicago'
41
42 # If you set this to True, Django will use timezone-aware datetimes.
43 USE_TZ = False
44
45 # Language code for this installation. All choices can be found here:
46 # http://www.i18nguy.com/unicode/language-identifiers.html
47 LANGUAGE_CODE = 'en-us'
48
49 # Languages we provide translations for, out of the box.
50 LANGUAGES = (
51 ('af', gettext_noop('Afrikaans')),
52 ('ar', gettext_noop('Arabic')),
53 ('az', gettext_noop('Azerbaijani')),
54 ('bg', gettext_noop('Bulgarian')),
55 ('be', gettext_noop('Belarusian')),
56 ('bn', gettext_noop('Bengali')),
57 ('br', gettext_noop('Breton')),
58 ('bs', gettext_noop('Bosnian')),
59 ('ca', gettext_noop('Catalan')),
60 ('cs', gettext_noop('Czech')),
61 ('cy', gettext_noop('Welsh')),
62 ('da', gettext_noop('Danish')),
63 ('de', gettext_noop('German')),
64 ('el', gettext_noop('Greek')),
65 ('en', gettext_noop('English')),
66 ('en-gb', gettext_noop('British English')),
67 ('eo', gettext_noop('Esperanto')),
68 ('es', gettext_noop('Spanish')),
69 ('es-ar', gettext_noop('Argentinian Spanish')),
70 ('es-mx', gettext_noop('Mexican Spanish')),
71 ('es-ni', gettext_noop('Nicaraguan Spanish')),
72 ('es-ve', gettext_noop('Venezuelan Spanish')),
73 ('et', gettext_noop('Estonian')),
74 ('eu', gettext_noop('Basque')),
75 ('fa', gettext_noop('Persian')),
76 ('fi', gettext_noop('Finnish')),
77 ('fr', gettext_noop('French')),
78 ('fy-nl', gettext_noop('Frisian')),
79 ('ga', gettext_noop('Irish')),
80 ('gl', gettext_noop('Galician')),
81 ('he', gettext_noop('Hebrew')),
82 ('hi', gettext_noop('Hindi')),
83 ('hr', gettext_noop('Croatian')),
84 ('hu', gettext_noop('Hungarian')),
85 ('ia', gettext_noop('Interlingua')),
86 ('id', gettext_noop('Indonesian')),
87 ('is', gettext_noop('Icelandic')),
88 ('it', gettext_noop('Italian')),
89 ('ja', gettext_noop('Japanese')),
90 ('ka', gettext_noop('Georgian')),
91 ('kk', gettext_noop('Kazakh')),
92 ('km', gettext_noop('Khmer')),
93 ('kn', gettext_noop('Kannada')),
94 ('ko', gettext_noop('Korean')),
95 ('lb', gettext_noop('Luxembourgish')),
96 ('lt', gettext_noop('Lithuanian')),
97 ('lv', gettext_noop('Latvian')),
98 ('mk', gettext_noop('Macedonian')),
99 ('ml', gettext_noop('Malayalam')),
100 ('mn', gettext_noop('Mongolian')),
101 ('my', gettext_noop('Burmese')),
102 ('nb', gettext_noop('Norwegian Bokmal')),
103 ('ne', gettext_noop('Nepali')),
104 ('nl', gettext_noop('Dutch')),
105 ('nn', gettext_noop('Norwegian Nynorsk')),
106 ('os', gettext_noop('Ossetic')),
107 ('pa', gettext_noop('Punjabi')),
108 ('pl', gettext_noop('Polish')),
109 ('pt', gettext_noop('Portuguese')),
110 ('pt-br', gettext_noop('Brazilian Portuguese')),
111 ('ro', gettext_noop('Romanian')),
112 ('ru', gettext_noop('Russian')),
113 ('sk', gettext_noop('Slovak')),
114 ('sl', gettext_noop('Slovenian')),
115 ('sq', gettext_noop('Albanian')),
116 ('sr', gettext_noop('Serbian')),
117 ('sr-latn', gettext_noop('Serbian Latin')),
118 ('sv', gettext_noop('Swedish')),
119 ('sw', gettext_noop('Swahili')),
120 ('ta', gettext_noop('Tamil')),
121 ('te', gettext_noop('Telugu')),
122 ('th', gettext_noop('Thai')),
123 ('tr', gettext_noop('Turkish')),
124 ('tt', gettext_noop('Tatar')),
125 ('udm', gettext_noop('Udmurt')),
126 ('uk', gettext_noop('Ukrainian')),
127 ('ur', gettext_noop('Urdu')),
128 ('vi', gettext_noop('Vietnamese')),
129 ('zh-cn', gettext_noop('Simplified Chinese')),
130 ('zh-tw', gettext_noop('Traditional Chinese')),
131 )
132
133 # Languages using BiDi (right-to-left) layout
134 LANGUAGES_BIDI = ("he", "ar", "fa", "ur")
135
136 # If you set this to False, Django will make some optimizations so as not
137 # to load the internationalization machinery.
138 USE_I18N = True
139 LOCALE_PATHS = ()
140 LANGUAGE_COOKIE_NAME = 'django_language'
141
142 # If you set this to True, Django will format dates, numbers and calendars
143 # according to user current locale.
144 USE_L10N = False
ReviewBoard关于语言和时区的配置:
[root@hellolinux ~]# cat -n /usr/lib64/python2.7/site-packages/reviewboard/settings.py|sed -n '29,70p'
29 # Time zone support. If enabled, Django stores date and time information as
30 # UTC in the database, uses time zone-aware datetime objects, and translates
31 # them to the user's time zone in templates and forms.
32 USE_TZ = True
33
34 # Local time zone for this installation. All choices can be found here:
35 # http://www.postgresql.org/docs/8.1/static/datetime-keywords.html#DATETIME-TIMEZONE-SET-TABLE
36 # When USE_TZ is enabled, this is used as the default time zone for datetime
37 # objects
38 TIME_ZONE = 'UTC'
39
40 # Language code for this installation. All choices can be found here:
41 # http://www.w3.org/TR/REC-html40/struct/dirlang.html#langcodes
42 # http://blogs.law.harvard.edu/tech/stories/storyReader$15
43 LANGUAGE_CODE = 'en-us'
44
45 # This should match the ID of the Site object in the database. This is used to
46 # figure out URLs to stick in e-mails and related pages.
47 SITE_ID = 1
48
49 # The prefix for e-mail subjects sent to administrators.
50 EMAIL_SUBJECT_PREFIX = "[Review Board] "
51
52 # Whether to allow for smart spoofing of From addresses for e-mails.
53 #
54 # If enabled (default), DMARC records will be looked up before determining
55 # whether to use the user's e-mail address as the From address.
56 #
57 # If disabled, the old, dumb approach of assuming we can spoof will be used.
58 EMAIL_ENABLE_SMART_SPOOFING = True
59
60 # Default name of the service used in From e-mail when not spoofing.
61 #
62 # This should generally not be overridden unless one needs to thoroughly
63 # distinguish between two different Review Board servers AND DMARC is causing
64 # issues for e-mails.
65 EMAIL_DEFAULT_SENDER_SERVICE_NAME = 'Review Board'
66
67 # If you set this to False, Django will make some optimizations so as not
68 # to load the internationalization machinery.
69 USE_I18N = True
70
此时ReviewBoard登陆界面如下:

修改配置¶
修改Django配置:
TIME_ZONE = 'Asia/Shanghai' # 设置时区为"亚洲/上海"
USE_TZ = True # 使用时区
修改ReviewBoard配置:
[root@helloreview reviewboard]# cat -n /usr/lib64/python2.7/site-packages/reviewboard/settings.py|sed -n '29,80p'
29 # Time zone support. If enabled, Django stores date and time information as
30 # UTC in the database, uses time zone-aware datetime objects, and translates
31 # them to the user's time zone in templates and forms.
32 USE_TZ = True
33
34 # Local time zone for this installation. All choices can be found here:
35 # http://www.postgresql.org/docs/8.1/static/datetime-keywords.html#DATETIME-TIMEZONE-SET-TABLE
36 # When USE_TZ is enabled, this is used as the default time zone for datetime
37 # objects
38 TIME_ZONE = 'Asia/Shanghai' #<-------------- 此行被修改
39
40 # Language code for this installation. All choices can be found here:
41 # http://www.w3.org/TR/REC-html40/struct/dirlang.html#langcodes
42 # http://blogs.law.harvard.edu/tech/stories/storyReader$15
43 LANGUAGE_CODE = 'zh-CN' # en-us,zh-TW,zh-CN #<-------------- 此行被修改
44
45 gettext_noop = lambda s: s #<-------------- 此行被增加
46 LANGUAGES = ( #<-------------- 此行被增加
47 ('zh-cn', gettext_noop('Simplified Chinese')), #<-------------- 此行被增加
48 #('zh-tw', gettext_noop('Traditional Chinese')), #<-------------- 此行被增加
49 ) #<-------------- 此行被增加
50
51 # This should match the ID of the Site object in the database. This is used to
52 # figure out URLs to stick in e-mails and related pages.
53 SITE_ID = 1
54
55 # The prefix for e-mail subjects sent to administrators.
56 EMAIL_SUBJECT_PREFIX = "[Review Board] "
57
58 # Whether to allow for smart spoofing of From addresses for e-mails.
59 #
60 # If enabled (default), DMARC records will be looked up before determining
61 # whether to use the user's e-mail address as the From address.
62 #
63 # If disabled, the old, dumb approach of assuming we can spoof will be used.
64 EMAIL_ENABLE_SMART_SPOOFING = True
65
66 # Default name of the service used in From e-mail when not spoofing.
67 #
68 # This should generally not be overridden unless one needs to thoroughly
69 # distinguish between two different Review Board servers AND DMARC is causing
70 # issues for e-mails.
71 EMAIL_DEFAULT_SENDER_SERVICE_NAME = 'Review Board'
72
73 # If you set this to False, Django will make some optimizations so as not
74 # to load the internationalization machinery.
75 USE_I18N = True
76 BASE_DIR = os.path.dirname(os.path.dirname(__file__)) #<-------------- 此行被增加
77 LOCALE_PATHS = ( #<-------------- 此行被增加
78 os.path.join(BASE_DIR, 'locale') #<-------------- 此行被增加
79 ) #<-------------- 此行被增加
80
解释:
TIME_ZONE = 'Asia/Shanghai'
设置时区为”亚洲/上海”。LANGUAGE_CODE = 'zh-CN' # en-us,zh-TW,zh-CN
设置语言编码,使用中文简体编码。gettext_noop = lambda s: s
增加国际化函数。LANGUAGES = (('zh-cn', gettext_noop('Simplified Chinese')))
增加国际化中文简体支持。LANGUAGES = (('zh-tw', gettext_noop('Traditional Chinese')))
增加国际化中文繁体支持,此行被注释。LOCALE_PATHS = (os.path.join(BASE_DIR, 'locale'))
指定本定国际化翻译文件所在的目录。
复制翻译文件夹¶
复制已经存在的本地化配置文件夹zh_TW为zh_CN:
[root@helloreview ~]# cd /usr/lib64/python2.7/site-packages/reviewboard/locale
[root@helloreview locale]# cp -r zh_TW zh_CN
[root@helloreview locale]# ls -lah
total 4.0K
drwxr-xr-x 10 root root 106 Sep 8 11:03 .
drwxr-xr-x 29 root root 4.0K Sep 8 11:02 ..
drwxr-xr-x 3 root root 25 Aug 27 11:01 en
drwxr-xr-x 3 root root 25 Aug 27 11:01 es
drwxr-xr-x 3 root root 25 Aug 27 11:01 it_IT
drwxr-xr-x 3 root root 25 Aug 27 11:01 ko_KR
drwxr-xr-x 3 root root 25 Aug 27 11:01 pt_BR
drwxr-xr-x 3 root root 25 Sep 1 10:17 zh_CN
drwxr-xr-x 3 root root 25 Aug 27 11:01 zh_TW
[root@helloreview locale]# cd zh_CN/LC_MESSAGES
[root@helloreview LC_MESSAGES]# ls -lah
total 220K
drwxr-xr-x 2 root root 78 Sep 6 13:14 .
drwxr-xr-x 3 root root 25 Sep 1 10:17 ..
-rw-r--r-- 1 root root 13K Sep 8 10:39 djangojs.mo
-rw-r--r-- 1 root root 29K Sep 6 13:14 djangojs.po
-rw-r--r-- 1 root root 48K Sep 8 10:39 django.mo
-rw-r--r-- 1 root root 121K Sep 6 11:13 django.po
django.po和djangojs.po为汉化翻译文件,我们将ReviewBoard界面中的英文字符翻译成中文,翻译内容就在这两个文件中。
文件内容类似以下内容:
[root@helloreview LC_MESSAGES]# tail -11 django.po #: templates/datagrids/hideable_listview.html:9 msgid “Show archived” msgstr “显示归档的评审请求”
#: templates/datagrids/columns.py:716 templates/datagrids/columns.py:717 msgid “Ship It!/Issue Counts” msgstr “评审通过/问题数量”
#: reviews/default_actions:200 msgid “Add General Comment” msgstr “新增普通评论”
设置重命名¶
可以将以下几个常用命令加入到bashrc中:
[root@helloreview ~]# echo "alias cdd='cd /usr/lib/python2.7/site-packages/django'" >> ~/.bashrc
[root@helloreview ~]# echo "alias cdr='cd /usr/lib64/python2.7/site-packages/reviewboard'" >> ~/.bashrc
[root@helloreview ~]# echo "alias cdrr='cd /var/www/html/reviewboard'" >> ~/.bashrc
[root@helloreview ~]# echo "alias rcc='pushd /usr/lib64/python2.7/site-packages/reviewboard && django-admin.py compilemessages && popd'" >> ~/.bashrc
[root@helloreview ~]# echo "alias rhttpd='systemctl restart httpd'" >> ~/.bashrc
重新加载个人配置:
[root@helloreview ~]# source ~/.bashrc
查找需要翻译的英文对应的文件¶
切换到ReviewBoard的app目录:
[root@helloreview ~]# cdr
[root@helloreview reviewboard]# pwd
/usr/lib64/python2.7/site-packages/reviewboard
查找需要翻译的英文对应的文件:
[root@helloreview reviewboard]# grep -Rn 'Log in to Review Board' * > ../a
[root@helloreview reviewboard]# vi ../a
1 Binary file locale/en/LC_MESSAGES/django.mo matches
2 locale/en/LC_MESSAGES/django.po:3098:msgid "Log in to Review Board"
3 Binary file locale/es/LC_MESSAGES/django.mo matches
4 locale/es/LC_MESSAGES/django.po:3108:msgid "Log in to Review Board"
5 Binary file locale/it_IT/LC_MESSAGES/django.mo matches
6 locale/it_IT/LC_MESSAGES/django.po:3113:msgid "Log in to Review Board"
7 locale/ko_KR/LC_MESSAGES/django.po:3098:msgid "Log in to Review Board"
8 locale/pt_BR/LC_MESSAGES/django.po:3114:msgid "Log in to Review Board"
9 Binary file locale/zh_TW/LC_MESSAGES/django.mo matches
10 locale/zh_TW/LC_MESSAGES/django.po:3138:msgid "Log in to Review Board"
11 Binary file locale/zh_CN/LC_MESSAGES/django.mo matches
12 locale/zh_CN/LC_MESSAGES/django.po:3211:msgid "Log in to Review Board"
13 templates/accounts/login.html:10: <h1>{% trans "Log in to Review Board" %}</h1>
就可以知道”Log in to Review Board”这段英文对应的文件是templates/accounts/login.html的10行的内容,我们看一下这个文件:
查看待翻译的原文件内容:
[root@helloreview reviewboard]# cat -n templates/accounts/login.html|sed -n '6,12p'
6 {% block auth_content %}
7 {% template_hook_point "before-login-form" %}
8
9 <div class="auth-header">
10 <h1>{% trans "Log in to Review Board" %}</h1>
11 {% if auth_backends.0.login_instructions %}
12 <p>{{auth_backends.0.login_instructions}}</p>
修改需要翻译的英文对应的文件¶
我们修改一下此处的内容,看看界面上面是否有变化,如将”Log in to Review Board”修改为”Log in to Review Board meizhaohui”。
修改待翻译的原文件,并重启Apache:
[root@helloreview reviewboard]# cat -n templates/accounts/login.html|sed -n '6,12p'
6 {% block auth_content %}
7 {% template_hook_point "before-login-form" %}
8
9 <div class="auth-header">
10 <h1>{% trans "Log in to Review Board meizhaohui" %}</h1>
11 {% if auth_backends.0.login_instructions %}
12 <p>{{auth_backends.0.login_instructions}}</p>
[root@helloreview reviewboard]# rhttpd
此时再刷新一下页面看一下,是否有变化。

刷新页面后,可以看到页面中出现了”Log in to Review Board meizhaohui”,说明此处就是登陆页面对应的翻译原文件。
我们将templates/accounts/login.html文件还原成原始状态。即将第10行还原成<h1>{% trans “Log in to Review Board” %}</h1>。
查看还原后的待翻译的原文件内容:
[root@helloreview reviewboard]# cat -n templates/accounts/login.html|sed -n '6,12p'
6 {% block auth_content %}
7 {% template_hook_point "before-login-form" %}
8
9 <div class="auth-header">
10 <h1>{% trans "Log in to Review Board" %}</h1>
11 {% if auth_backends.0.login_instructions %}
12 <p>{{auth_backends.0.login_instructions}}</p>
增加翻译文本¶
我们在翻译文件中增加翻译内容:
[root@helloreview reviewboard]# cat -n locale/zh_CN/LC_MESSAGES/django.po|sed -n '3206,3216p'
3206 #: templates/accounts/login.html:4
3207 msgid "Log In"
3208 msgstr "登入"
3209
3210 #: templates/accounts/login.html:10
3211 msgid "Log in to Review Board"
3212 msgstr "登陆Review Board"
3213
3214 #: templates/accounts/login.html:40 templates/base/headerbar.html:32
3215 msgid "Log in"
3216 msgstr "登入"
编译生成mo文件¶
编译生成mo文件:
[root@helloreview reviewboard]# django-admin.py compilemessages
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/en/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/en/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/es/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/es/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/it_IT/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/it_IT/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/ko_KR/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/ko_KR/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/pt_BR/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/pt_BR/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_TW/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_TW/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_CN/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_CN/LC_MESSAGES
利用快捷命令重新生成mo文件并重启Apache:
[root@helloreview reviewboard]# rcc && rhttpd
/usr/lib64/python2.7/site-packages/reviewboard /usr/lib64/python2.7/site-packages/reviewboard
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/en/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/en/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/es/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/es/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/it_IT/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/it_IT/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/ko_KR/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/ko_KR/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/pt_BR/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/pt_BR/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_TW/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_TW/LC_MESSAGES
processing file djangojs.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_CN/LC_MESSAGES
processing file django.po in /usr/lib64/python2.7/site-packages/reviewboard/locale/zh_CN/LC_MESSAGES
/usr/lib64/python2.7/site-packages/reviewboard
重新刷新ReviewBoard系统,可以看到中文翻译:
