2012年4月29日 星期日

如何找出module 中所有定義的名稱:包括變數,函式,以及module


使用dir() 函式


內建的 dir() 函式主要是用來找出某個module裡面所定義的所有名稱。其傳回值是一串經過排序了的字串list:
>>> import fibo, sys
>>> dir(fibo)
['__name__', 'fib', 'fib2']
>>> dir(sys)
['__name__', 'argv', 'builtin_module_names', 'copyright', 'exit',
'maxint', 'modules', 'path', 'ps1', 'ps2', 'setprofile', 'settrace',
'stderr', 'stdin', 'stdout', 'version']
如果沒有傳入參數的話, dir() 會列出所有你目前已經定義的名稱
>>> a = [1, 2, 3, 4, 5]
>>> import fibo, sys
>>> fib = fibo.fib
>>> dir()
['__name__', 'a', 'fib', 'fibo', 'sys']
注意這裡的名稱是指所有類型的名稱:包括變數,函式,以及module等等。
dir() 並沒有列出所有內建的函式及變數的名稱。
如果要列出內建函數及變數的話dir(__bultin__)
>>> import __builtin__
>>> dir(__builtin__)
['AccessError', 'AttributeError', 'ConflictError', 'EOFError', 'IOError',
'ImportError', 'IndexError', 'KeyError', 'KeyboardInterrupt',
'MemoryError', 'NameError', 'None', 'OverflowError', 'RuntimeError',
'SyntaxError', 'SystemError', 'SystemExit', 'TypeError', 'ValueError',
'ZeroDivisionError', '__name__', 'abs', 'apply', 'chr', 'cmp', 'coerce',
'compile', 'dir', 'divmod', 'eval', 'execfile', 'filter', 'float',
'getattr', 'hasattr', 'hash', 'hex', 'id', 'input', 'int', 'len', 'long',
'map', 'max', 'min', 'oct', 'open', 'ord', 'pow', 'range', 'raw_input',
'reduce', 'reload', 'repr', 'round', 'setattr', 'str', 'type', 'xrange']

python import 需要注意的問題



**平台的import麻煩點
Mac,Windows 平台下:檔案的名稱大小寫並不統一。
所以在這些平台之上,我們並無法保證 ECHO.PY 這個檔案應該被import成 echoEcho 或 ECHO (例如,Windows 95 有一個惱人的特點,就是會自動把所有的檔案名稱第一個字元大寫)。


DOS的 8+3 檔名限制對長的module名稱


**定義__all__的必要性

如果沒有定義 __all__ 的話, from Sound.Effects import * 這個敘述就 不會 從 Sound.Effects 這個package裡面import所有的module進入目前的命名空間(namespace)。唯一能保證的是 Sound.Effects 這個package有被imported 進來(可能會執行 __init__.py 裡面的初始化程式碼),並且這個package裡面所定義的名稱會被import進來。Package裡所定義的名稱包含了在 __init__.py 裡面所定義的名稱(以及所import的module)。當然也包含了在之前用import引進來的module名稱,例如:



Sound/                          Top-level package
      __init__.py               Initialize the sound package
      Formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      Effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      Filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...



在這個例子裡,echo以及 surround 這兩個modules 都會被 import進來目前的命名空間(namespace)裡。這是因為當 from...import 這個敘述執行的時候,這兩個module都已經在這個package中有定義了(你也可以用 __all__ 來定義)。值得注意的是使

  1. 用import * 這樣的寫法常常是不被鼓勵的=>可讀性
  2.  from Package import specific_submodule =>推薦使用的形式。
  3. 上面的 __init__.py 這個檔是必須要的=>避免直譯器誤認module
  4. 在__init__.py文件中定義__all__ = ["echo", "surround", "reverse"]



為使Python能把這個目錄架構當作是一個package,上面的 __init__.py 這個檔是必須要的。這是為了要避免有些檔案目錄的名字是很普通的名字(例如 "string" ),這會讓直譯器誤認正確的module名稱而找不到在搜尋路徑中的module。在最簡單的例子裡, __init__.py 可以是一個空的檔案。但是你也可以讓這個檔來做一些package初始化的動作,或者設定 __all__ 這個變數(稍後會再提)。

python package命名規範

dotted module names : A.B.C


package的檔案階層系統

Sound/                          Top-level package
      __init__.py               Initialize the sound package
      Formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      Effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      Filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...




**__init__.py的必要性
為使Python能把這個目錄架構當作是一個package,上面的 __init__.py 這個檔是必須要的。這是為了要避免有些檔案目錄的名字是很普通的名字(例如 " string" ),這會讓直譯器誤認正確的module名稱而找不到在搜尋路徑中的module。在最簡單的例子裡, __init__.py 可以是一個空的檔案。但是你也可以讓這個檔來做一些package初始化的動作,或者設定 __all__ 這個變數(稍後會再提)。




**import 敘述中的要點

值得注意的是當你使用 from package import item 這樣的敘述時,你所import的東西可以是一個package中的module(或者是subpackage),或者是在module裡面所定義的名稱,例如變數、類別或是函式等等。 import 敘述會先測試是否這個東西真的存在於這個package,如果沒有的話,會假設這是一個module然後試著導入(load)之。如果還失敗的話,就會引發一個 ImportError 的例外狀況(exception)。


相反的是,當你使用 import item.subitem.subsubitem 這樣的敘述時,除了最後一個東西(item)以外,其餘的都必須是package。最後一個可以是 module 或是 package ,但是不能是一個定義在module裡面的類別、成員或函式。


編譯檔案的程式碼freeze.py 原始碼


#! /usr/bin/env python

"""Freeze a Python script into a binary.

usage: freeze [options...] script [module]...

Options:
-p prefix:    This is the prefix used when you ran ``make install''
              in the Python build directory.
              (If you never ran this, freeze won't work.)
              The default is whatever sys.prefix evaluates to.
              It can also be the top directory of the Python source
              tree; then -P must point to the build tree.

-P exec_prefix: Like -p but this is the 'exec_prefix', used to
                install objects etc.  The default is whatever sys.exec_prefix
                evaluates to, or the -p argument if given.
                If -p points to the Python source tree, -P must point
                to the build tree, if different.

-e extension: A directory containing additional .o files that
              may be used to resolve modules.  This directory
              should also have a Setup file describing the .o files.
              On Windows, the name of a .INI file describing one
              or more extensions is passed.
              More than one -e option may be given.

-o dir:       Directory where the output files are created; default '.'.

-m:           Additional arguments are module names instead of filenames.

-a package=dir: Additional directories to be added to the package's
                __path__.  Used to simulate directories added by the
                package at runtime (eg, by OpenGL and win32com).
                More than one -a option may be given for each package.

-l file:      Pass the file to the linker (windows only)

-d:           Debugging mode for the module finder.

-q:           Make the module finder totally quiet.

-h:           Print this help message.

-x module     Exclude the specified module. It will still be imported
              by the frozen binary if it exists on the host system.

-X module     Like -x, except the module can never be imported by
              the frozen binary.

-E:           Freeze will fail if any modules can't be found (that
              were not excluded using -x or -X).

-i filename:  Include a file with additional command line options.  Used
              to prevent command lines growing beyond the capabilities of
              the shell/OS.  All arguments specified in filename
              are read and the -i option replaced with the parsed
              params (note - quoting args in this file is NOT supported)

-s subsystem: Specify the subsystem (For Windows only.); 
              'console' (default), 'windows', 'service' or 'com_dll'
              
-w:           Toggle Windows (NT or 95) behavior.
              (For debugging only -- on a win32 platform, win32 behavior
              is automatic.)

-r prefix=f:  Replace path prefix.
              Replace prefix with f in the source path references 
              contained in the resulting binary.

Arguments:

script:       The Python script to be executed by the resulting binary.

module ...:   Additional Python modules (referenced by pathname)
              that will be included in the resulting binary.  These
              may be .py or .pyc files.  If -m is specified, these are
              module names that are search in the path instead.

NOTES:

In order to use freeze successfully, you must have built Python and
installed it ("make install").

The script should not use modules provided only as shared libraries;
if it does, the resulting binary is not self-contained.
"""


# Import standard modules

import modulefinder
import getopt
import os
import sys


# Import the freeze-private modules

import checkextensions
import makeconfig
import makefreeze
import makemakefile
import parsesetup
import bkfile


# Main program

def main():
    # overridable context
    prefix = None                       # settable with -p option
    exec_prefix = None                  # settable with -P option
    extensions = []
    exclude = []                        # settable with -x option
    addn_link = []      # settable with -l, but only honored under Windows.
    path = sys.path[:]
    modargs = 0
    debug = 1
    odir = ''
    win = sys.platform[:3] == 'win'
    replace_paths = []                  # settable with -r option
    error_if_any_missing = 0

    # default the exclude list for each platform
    if win: exclude = exclude + [
        'dos', 'dospath', 'mac', 'macpath', 'macfs', 'MACFS', 'posix',
        'os2', 'ce', 'riscos', 'riscosenviron', 'riscospath',
        ]

    fail_import = exclude[:]

    # output files
    frozen_c = 'frozen.c'
    config_c = 'config.c'
    target = 'a.out'                    # normally derived from script name
    makefile = 'Makefile'
    subsystem = 'console'

    # parse command line by first replacing any "-i" options with the
    # file contents.
    pos = 1
    while pos < len(sys.argv)-1:
        # last option can not be "-i", so this ensures "pos+1" is in range!
        if sys.argv[pos] == '-i':
            try:
                options = open(sys.argv[pos+1]).read().split()
            except IOError, why:
                usage("File name '%s' specified with the -i option "
                      "can not be read - %s" % (sys.argv[pos+1], why) )
            # Replace the '-i' and the filename with the read params.
            sys.argv[pos:pos+2] = options
            pos = pos + len(options) - 1 # Skip the name and the included args.
        pos = pos + 1

    # Now parse the command line with the extras inserted.
    try:
        opts, args = getopt.getopt(sys.argv[1:], 'r:a:dEe:hmo:p:P:qs:wX:x:l:')
    except getopt.error, msg:
        usage('getopt error: ' + str(msg))

    # proces option arguments
    for o, a in opts:
        if o == '-h':
            print __doc__
            return
        if o == '-d':
            debug = debug + 1
        if o == '-e':
            extensions.append(a)
        if o == '-m':
            modargs = 1
        if o == '-o':
            odir = a
        if o == '-p':
            prefix = a
        if o == '-P':
            exec_prefix = a
        if o == '-q':
            debug = 0
        if o == '-w':
            win = not win
        if o == '-s':
            if not win:
                usage("-s subsystem option only on Windows")
            subsystem = a
        if o == '-x':
            exclude.append(a)
        if o == '-X':
            exclude.append(a)
            fail_import.append(a)
        if o == '-E':
            error_if_any_missing = 1
        if o == '-l':
            addn_link.append(a)
        if o == '-a':
            apply(modulefinder.AddPackagePath, tuple(a.split("=", 2)))
        if o == '-r':
            f,r = a.split("=", 2)
            replace_paths.append( (f,r) )

    # modules that are imported by the Python runtime
    implicits = []
    for module in ('site', 'warnings',):
        if module not in exclude:
            implicits.append(module)

    # default prefix and exec_prefix
    if not exec_prefix:
        if prefix:
            exec_prefix = prefix
        else:
            exec_prefix = sys.exec_prefix
    if not prefix:
        prefix = sys.prefix

    # determine whether -p points to the Python source tree
    ishome = os.path.exists(os.path.join(prefix, 'Python', 'ceval.c'))

    # locations derived from options
    version = sys.version[:3]
    if win:
        extensions_c = 'frozen_extensions.c'
    if ishome:
        print "(Using Python source directory)"
        binlib = exec_prefix
        incldir = os.path.join(prefix, 'Include')
        config_h_dir = exec_prefix
        config_c_in = os.path.join(prefix, 'Modules', 'config.c.in')
        frozenmain_c = os.path.join(prefix, 'Python', 'frozenmain.c')
        makefile_in = os.path.join(exec_prefix, 'Modules', 'Makefile')
        if win:
            frozendllmain_c = os.path.join(exec_prefix, 'Pc\\frozen_dllmain.c')
    else:
        binlib = os.path.join(exec_prefix,
                              'lib', 'python%s' % version, 'config')
        incldir = os.path.join(prefix, 'include', 'python%s' % version)
        config_h_dir = os.path.join(exec_prefix, 'include',
                                    'python%s' % version)
        config_c_in = os.path.join(binlib, 'config.c.in')
        frozenmain_c = os.path.join(binlib, 'frozenmain.c')
        makefile_in = os.path.join(binlib, 'Makefile')
        frozendllmain_c = os.path.join(binlib, 'frozen_dllmain.c')
    supp_sources = []
    defines = []
    includes = ['-I' + incldir, '-I' + config_h_dir]

    # sanity check of directories and files
    check_dirs = [prefix, exec_prefix, binlib, incldir]
    if not win:
        # These are not directories on Windows.
        check_dirs = check_dirs + extensions
    for dir in check_dirs:
        if not os.path.exists(dir):
            usage('needed directory %s not found' % dir)
        if not os.path.isdir(dir):
            usage('%s: not a directory' % dir)
    if win:
        files = supp_sources + extensions # extensions are files on Windows.
    else:
        files = [config_c_in, makefile_in] + supp_sources
    for file in supp_sources:
        if not os.path.exists(file):
            usage('needed file %s not found' % file)
        if not os.path.isfile(file):
            usage('%s: not a plain file' % file)
    if not win:
        for dir in extensions:
            setup = os.path.join(dir, 'Setup')
            if not os.path.exists(setup):
                usage('needed file %s not found' % setup)
            if not os.path.isfile(setup):
                usage('%s: not a plain file' % setup)

    # check that enough arguments are passed
    if not args:
        usage('at least one filename argument required')

    # check that file arguments exist
    for arg in args:
        if arg == '-m':
            break
        # if user specified -m on the command line before _any_
        # file names, then nothing should be checked (as the
        # very first file should be a module name)
        if modargs:
            break
        if not os.path.exists(arg):
            usage('argument %s not found' % arg)
        if not os.path.isfile(arg):
            usage('%s: not a plain file' % arg)

    # process non-option arguments
    scriptfile = args[0]
    modules = args[1:]

    # derive target name from script name
    base = os.path.basename(scriptfile)
    base, ext = os.path.splitext(base)
    if base:
        if base != scriptfile:
            target = base
        else:
            target = base + '.bin'

    # handle -o option
    base_frozen_c = frozen_c
    base_config_c = config_c
    base_target = target
    if odir and not os.path.isdir(odir):
        try:
            os.mkdir(odir)
            print "Created output directory", odir
        except os.error, msg:
            usage('%s: mkdir failed (%s)' % (odir, str(msg)))
    base = ''
    if odir:
        base = os.path.join(odir, '')
        frozen_c = os.path.join(odir, frozen_c)
        config_c = os.path.join(odir, config_c)
        target = os.path.join(odir, target)
        makefile = os.path.join(odir, makefile)
        if win: extensions_c = os.path.join(odir, extensions_c)

    # Handle special entry point requirements
    # (on Windows, some frozen programs do not use __main__, but
    # import the module directly.  Eg, DLLs, Services, etc
    custom_entry_point = None  # Currently only used on Windows
    python_entry_is_main = 1   # Is the entry point called __main__?
    # handle -s option on Windows
    if win:
        import winmakemakefile
        try:
            custom_entry_point, python_entry_is_main = \
                winmakemakefile.get_custom_entry_point(subsystem)
        except ValueError, why:
            usage(why)
            

    # Actual work starts here...

    # collect all modules of the program
    dir = os.path.dirname(scriptfile)
    path[0] = dir
    mf = modulefinder.ModuleFinder(path, debug, exclude, replace_paths)
    
    if win and subsystem=='service':
        # If a Windows service, then add the "built-in" module.
        mod = mf.add_module("servicemanager")
        mod.__file__="dummy.pyd" # really built-in to the resulting EXE

    for mod in implicits:
        mf.import_hook(mod)
    for mod in modules:
        if mod == '-m':
            modargs = 1
            continue
        if modargs:
            if mod[-2:] == '.*':
                mf.import_hook(mod[:-2], None, ["*"])
            else:
                mf.import_hook(mod)
        else:
            mf.load_file(mod)

    # Add the main script as either __main__, or the actual module name.
    if python_entry_is_main:
        mf.run_script(scriptfile)
    else:
        mf.load_file(scriptfile)

    if debug > 0:
        mf.report()
        print
    dict = mf.modules

    if error_if_any_missing:
        missing = mf.any_missing()
        if missing:
            sys.exit("There are some missing modules: %r" % missing)

    # generate output for frozen modules
    files = makefreeze.makefreeze(base, dict, debug, custom_entry_point,
                                  fail_import)

    # look for unfrozen modules (builtin and of unknown origin)
    builtins = []
    unknown = []
    mods = dict.keys()
    mods.sort()
    for mod in mods:
        if dict[mod].__code__:
            continue
        if not dict[mod].__file__:
            builtins.append(mod)
        else:
            unknown.append(mod)

    # search for unknown modules in extensions directories (not on Windows)
    addfiles = []
    frozen_extensions = [] # Windows list of modules.
    if unknown or (not win and builtins):
        if not win:
            addfiles, addmods = \
                      checkextensions.checkextensions(unknown+builtins,
                                                      extensions)
            for mod in addmods:
                if mod in unknown:
                    unknown.remove(mod)
                    builtins.append(mod)
        else:
            # Do the windows thang...
            import checkextensions_win32
            # Get a list of CExtension instances, each describing a module 
            # (including its source files)
            frozen_extensions = checkextensions_win32.checkextensions(
                unknown, extensions, prefix)
            for mod in frozen_extensions:
                unknown.remove(mod.name)

    # report unknown modules
    if unknown:
        sys.stderr.write('Warning: unknown modules remain: %s\n' %
                         ' '.join(unknown))

    # windows gets different treatment
    if win:
        # Taking a shortcut here...
        import winmakemakefile, checkextensions_win32
        checkextensions_win32.write_extension_table(extensions_c,
                                                    frozen_extensions)
        # Create a module definition for the bootstrap C code.
        xtras = [frozenmain_c, os.path.basename(frozen_c),
                 frozendllmain_c, os.path.basename(extensions_c)] + files
        maindefn = checkextensions_win32.CExtension( '__main__', xtras )
        frozen_extensions.append( maindefn )
        outfp = open(makefile, 'w')
        try:
            winmakemakefile.makemakefile(outfp,
                                         locals(),
                                         frozen_extensions,
                                         os.path.basename(target))
        finally:
            outfp.close()
        return

    # generate config.c and Makefile
    builtins.sort()
    infp = open(config_c_in)
    outfp = bkfile.open(config_c, 'w')
    try:
        makeconfig.makeconfig(infp, outfp, builtins)
    finally:
        outfp.close()
    infp.close()

    cflags = ['$(OPT)']
    cppflags = defines + includes
    libs = [os.path.join(binlib, 'libpython$(VERSION).a')]

    somevars = {}
    if os.path.exists(makefile_in):
        makevars = parsesetup.getmakevars(makefile_in)
    for key in makevars.keys():
        somevars[key] = makevars[key]

    somevars['CFLAGS'] = ' '.join(cflags) # override
    somevars['CPPFLAGS'] = ' '.join(cppflags) # override
    files = [base_config_c, base_frozen_c] + \
            files + supp_sources +  addfiles + libs + \
            ['$(MODLIBS)', '$(LIBS)', '$(SYSLIBS)']

    outfp = bkfile.open(makefile, 'w')
    try:
        makemakefile.makemakefile(outfp, somevars, files, base_target)
    finally:
        outfp.close()

    # Done!

    if odir:
        print 'Now run "make" in', odir,
        print 'to build the target:', base_target
    else:
        print 'Now run "make" to build the target:', base_target


# Print usage message and exit

def usage(msg):
    sys.stdout = sys.stderr
    print "Error:", msg
    print "Use ``%s -h'' for help" % sys.argv[0]
    sys.exit(2)


main()

2012年4月28日 星期六

編譯過的 Compiled Python檔案



對於一些小的程式來說,如果使用很多標準的module,而又想加速啟動的過程,你就可以用編譯過的Python檔案。比如你要找 spam.py ,如果在你找到這個檔案的目錄裡ey4又有一個叫做spam.pyc 的檔案的話,這就表示 spam 這個module有一個已經二元編譯過的(``byte-compiled'')的版本可以使用。在 spam.pyc 裡面也會記錄用來創造它的spam.py上一次被修改的時間,如果 .pyc裡面所儲存的時間與最新版本的 .py 的修改時間不符合的話, .pyc 檔案就不會被使用。
一般來說,你不需要做任何事來創造一個 spam.pyc 檔案。當你成功的編譯一個 spam.py 檔時,自動的 spam.pyc 檔就會寫入在同一個目錄裡。如果這個過程裡有問題的話,系統不會當這是個錯誤情況(error)。相反的,如果寫入的檔案沒有完全成功的寫入的話,這個檔案只會被認為是不正確的而忽視其存在。 spam.pyc 檔案的內容是獨立於作業系統平台的,所以一個 Python module 的目錄是可以被在各種不同架構下的多台機器所共享的。
這裡有一些給專家們的秘訣:

  1. 當使用 -O 這個選項啟動Python直譯器時,直譯器產生會最佳化程式碼
  2. (optimized code),並存在 .pyo 檔案裡。這個最佳化程式碼目前並沒有太多功能,它只是簡單的拿掉所有的 assert 敘述以及 SET_LINENO 指令。當你使用 -O 這個選項時, 所有的 二元碼(bytecode)都會被最佳化,所有的 .pyc 檔都會被忽略,所有的 .py 檔案都會被編譯成最佳化的二元碼。
  3. 如果你傳入兩個 -O 選項給Python直譯器的話 ( -OO) ,在有些很少見的情況下會使得編譯器的最佳化過程使得程式無法正常執行。目前這個選項會使得 __doc__ 字串從二元碼中被拿掉,進而使得 .pyo 檔案可以更精簡。但是有些程式會使用到這些字串,所以你應該只有在你很確定時才使用這個選項。
  4. 讀 .pyc 以及 .pyo 檔案並不會比讀 .py 檔還要快,唯一的差距是在當被導入(load)時的速度有差別。
  5. 當你在命令列(command line)使用script的名稱來執行它的話,並不會造成二元碼被寫到 .pyc 或是 .pyo 所以,你可以把這個script寫成一個module,然後再用一個小的啟動的script來import這個module。這樣可以減少啟動的時間。事實上,你也可以直接從命令列啟動 .pyc 或是 .pyo 檔案。
  6. 你也可以把 spam.pyc (或是 spam.pyo ,如果你用了 -O 的話) 放在沒有 spam.py 的目錄裡。這樣子,當你給別人你的程式庫時,你可以給他們比較難用逆向工程(reverse engineer)破解的程式。
  7. 你可以用 compileall 這個module來將某個目錄裡面的所有module都便成 .pyc 檔案(或者是 .pyo 檔案,如果你用了 -O )。
$ python2.4 -c \
 "from compileall import compile_dir;compile_dir('.')"

Python 標準程式庫的 compileall 模組提供了兩個函式:compile_path 和 compile_dir。
compile_path 很猛,會把 sys.path 裡找得到的所有路徑下的 .py 都編成 .pyc,有點太具侵略性了一點。compile_dir 對 Python 程式寫手可能比較好用點。
在開發 Python 程式的時候,用 compile_dir 可以把任意目錄下的所有 .py 編譯成 .pyc。上述的 command line one liner 會遞迴地把目前目錄下的所有 .py 編譯為 .pyc。應用時機?開發中版本的 web 應用程式 (httpd 使用者與 Python 程式擁有者不同時,因為權限因素,.pyc 是不會自動產生的)。



如何將python專案打包成一個package或者是執行檔的方式呢? 
在 Windows 下有 py2exe
在 Linux/Un*x 下你需要的是 freeze.py



測試編譯: python /path/to/freeze.py [參數] 編譯檔名.py

# python /path/to/freeze.py -o dist test1.py
# cd dist
# make
測試編譯後的linux執行檔
# ./test1

---
錯誤提示:
缺少python2.6/config/config.c.in
安裝以下套件即可
apt-get install python2.6-dev

錯誤提示:
/usr/lib/python2.6/config/libpython2.6.a(posixmodule.o): In function `posix_tmpnam':
(.text+0x783): warning: the use of `tmpnam_r' is dangerous, better use `mkstemp'
/usr/lib/python2.6/config/libpython2.6.a(posixmodule.o): In function `posix_tempnam':
(.text+0x865): warning: the use of `tempnam' is dangerous, better use `mkstemp'
config.o:(.data+0x98): undefined reference to `init_warnings'
collect2: ld returned 1 exit status
make: *** [client] Error 1

安裝以下套件即可
apt-get install ?????

---
安裝 psyco 加速模組
# wget http://downloads.sourceforge.net/project/psyco/psyco/1.6/psyco-1.6-linux.i386-2.5.tar.gz?use_mirror=nchc
# tar xzvf psyco-1.6-linux.i386-2.5.tar.gz
# cd psyco-1.6
# cp -rf psyco /usr/lib/python2.5/site-packages/

#加入 import psyco
try:
    import psyco
    psyco.profile()
except:
    pass

加入psyco模組,再用freeze作編譯,正常可執行

python 尋找模組的路徑


請參閱之後的標準模組(standard module)一段。

從當前目錄. 尋找模組
從環境變數$PYTHONPATH中尋找模組
從安裝時預設的目錄 (/usr/local/lib/python)

NOTE:
sys.path會讀取 當前目錄 ,環境變數中的目錄,安裝時預設的目錄
可以修改sys.path這個list中的內容改變直譯器尋找module的路徑

python 的特殊方法名稱


class Rational:
    def __init__(self, n, d):  # 物件建立之後所要建立的初始化動作
        self.numer = n
        self.denom = d
    
    def __str__(self):   # 定義物件的字串描述
        return str(self.numer) + '/' + str(self.denom)
    
    def __add__(self, that):  # 定義 + 運算
        return Rational(self.numer * that.denom + that.numer * self.denom, 
                        self.denom * that.denom)
    
    def __sub__(self, that):  # 定義 - 運算
        return Rational(self.numer * that.denom - that.numer * self.denom,
                        self.denom * that.denom)
                           
    def __mul__(self, that):  # 定義 * 運算
        return Rational(self.numer * that.numer, 
                        self.denom * that.denom)
        
    def __truediv__(self, that):   # 定義 / 運算
        return Rational(self.numer * that.denom,
                        self.denom * that.denom)

    def __eq__(self, that):   # 定義 == 運算
        return self.numer * that.denom == that.numer * self.denom

x = Rational(1, 2)
y = Rational(2, 3)
z = Rational(2, 3)
print(x)       # 1/2
print(y)       # 2/3
print(x + y)   # 7/6
print(x - y)   # -1/6
print(x * y)   # 2/6
print(x / y)   # 3/6
print(x == y)  # False
print(y == z)  # True

__str__()用來定義傳回物件描述字串,通常用來描述的字串是對使用者友善的說明文字,如果對物件使用str(),所呼叫的就是__str__()。如果要定義對開發人員較有意義的描述,例如傳回產生實例的類別名稱之類的,則可以定義__repr__(),如果對物件使用repr(),則所呼叫的就是__repr__()。

在 特性名稱空間 中看過的例子則是

__getattr__():而用來定義實例的特性被取得時該作什麼動作
__setattr__():而用來定義實例的特性被設定時該作什麼動作。例如:

__getitem__()與__setitem__()則用來設定 []運算子 的行為。例如:

>>> class Some: ...     def __init__(self): ...         self.inner = {} ...     def __setitem__(self, name, value): ...         self.inner[name] = value ...     def __getitem__(self, name): ...         return self.inner[name]
在Python中,你可以讓物件實作__iter__()方法,這個方法可以傳回一個迭代器(Iterator),一個具有__next__()方法的物件。迭代器走訪物件內容收集物件後傳回,每次呼叫迭代器物件的__next__()方法,必須傳回群集的下一個元素,如果沒有下一個元素了,則丟出StopIteration物件。例如:
class Some:
    class Iterator:
        def __init__(self, length):
            self.length = length
            self.number = -1
        def __next__(self):
            self.number = self.number + 1
            if self.number == self.length:
                raise StopIteration
            return self.number
    
    def __init__(self, length):
        self.length = length

    def __iter__(self):
        return Some.Iterator(self.length)

s = Some(3)
it = iter(s)
print(next(it))   # 0
print(next(it))   # 1
print(next(it))   # 2
print(next(it))   # StopIteration
實際上,你可以使用iter()來代為呼叫物件的__iter__()方法,使用next()方法代為呼叫物件的__next__()方法。事實上,你可以結合for in迴圈來提取物件,for in迴圈會透過__iter__()取得迭代器,然後在每次迴圈中呼叫__next__()方法,而後遇到StopIteration丟出後離開迴圈。例如:
for n in Some(10):     print(n)       # 顯示 0 到 9
以上先簡介一些簡單的特殊方法名稱,詳細的特殊方法說明,可以參考 Special method names
另外還有描述器(descriptor) def __get__(self, instance, owner) def __set__(self, instance, value) def __delete__(self, instance)詳細的使用方法請參考 python描述器 一文
在Python中,所謂描述器,是用來描述特性的取得、設定、刪除該如何處理的物件,也就是說,當描述器實際是某個類別的特性成員時,對於類別特性的取得、設定或刪除,將會交由描述器來決定如何處理(除了那些內建特性,如__class__等特性之外)。例如:
class Descriptor:
    def __get__(self, instance, owner):
        print(self, instance, owner)
    def __set__(self, instance, value):
        print(self, instance, value)
    def __delete__(self, instance):
        print(self, instance)

class Some:
    x = Descriptor()

Python中尋找特性的順序

如果實例的__dict__中沒有,則到產生實例的類別__dict__中尋找




>>> class Math:
PI=3.1415926



>>> m = Math()

>>> vars(m)  ##雖然m沒有回傳特性
{}





##會先在m的__dict中搜尋有無PI特性,若沒有則搜尋類別Math是否有PI特性




>>> m.PI  
3.1415926



##若重新設定m.PI的特性
>>> m.PI = 2
##m中的特性空間__dict__中顯示出PI5這個特性
>>> vars(m)
{'PI': 2}


總結:
如果嘗試透過實例取得某個特性,如果實例的__dict__中沒有,則到產 生實例的類別__dict__中尋找,如果類別__dict__仍沒有,則會試著呼叫__getattr__()來傳回,如果沒有定義 __getattr__()方法,則會引發AttributeError,如果有__getattr__(),則看__getattr__()如何處理



流程:
1)先搜尋實例m的__dict__是否有該特性
2)再搜尋類別Math是否有該特性
3)呼叫__getattr__(),看__getattr__()如何處理







***實際中運作的例子再看一個
如果你試著在實例上呼叫某個方法,而該實例上沒有該綁定方法時(被@staticmethod或@classmethod修飾的函式),則會試著去類別__dict__中尋找,並以類別呼叫方式來執行函式。例如:

>>> class Some:
...     @staticmethod
...     def service():
...         print('XD')
...
>>> s = Some()
>>> s.service()
XD
>>>


在上例中,嘗試執行s.service(),由於s並沒有service()的綁定方法(因為被@staticmethod修飾),所以嘗試尋找Some.service()執行。



[進階]尋找特性的順序 from 良葛格
當一個物件擁有__get__()方法(必要),以及選擇性的__set__()、__delete__()方法時,它可以作為描述器(Descriptor)
def __get__(self, instance, owner)
def __set__(self, instance, value)
def __delete__(self, instance)

在Python中,所謂描述器,是用來描述特性的取得、設定、刪除該如何處理的物件,也就是說,當描述器實際是某個類別的特性成員時,對於類別特性的取得、設定或刪除,將會交由描述器來決定如何處理(除了那些內建特性,如__class__等特性之外)。例如:
class Descriptor:
    def __get__(self, instance, owner):
        print(self, instance, owner)
    def __set__(self, instance, value):
        print(self, instance, value)
    def __delete__(self, instance):
        print(self, instance)

class Some:
    x = Descriptor()

在上例中,如果這麼執行:

s = Some()
s.x

s.x = 10
del s.x

其實相當於這麼作:

s = Some()
Some.__dict__['x'].__get__(s,  Some);

Some.__dict__['x'].__set__(s,  10);
Some.__dict__['x'].__delete__(s);

如果這麼作的話:

Some.x

則相當於這麼作:

Some.__dict__['x'].__get__(None, Some)

 特性名稱空間 中談過特性搜尋的順序,依其中描述整理一下的話,特性的尋找順序是:
  1. 在實例的__dict__中尋找是否有相符的特性名稱
  2. 在產生實例的類別__dict__中尋找是否有相符的特性名稱
  3. 如果實例有定義__getattr__(),則看__getattr__()如何處理
  4. 如果實例沒有定義__getattr__(),則丟出AttributeError

如果加上描述器,則尋找的順序是:
  1. 在產生實例的類別__dict__中尋找是否有相符的特性名稱。如果找到 且實際是個描述器實例(也就是具有__get__()方法),且具有__set__()或__delete__()方法,若為取值,則傳回__get__ ()方法的值,若為設值,則呼叫__set__()(沒有這個方法則丟出AttributeError),若為刪除特性,則呼叫__delete__()(沒有這個方法則丟出AttributeError),如果描述器僅具有__get__(),則先進行第2步
  2. 在實例的__dict__中尋找是否有相符的特性名稱
  3. 在產生實例的類別__dict__中尋找是否有相符的特性名稱。如果不是描述器則直接傳回特性值。如果是個描述器(此時一定是僅具有__get__()方法),則傳回__get__()的值
  4. 如果實例有定義__getattr__(),則看__getattr__()如何處理
  5. 如果實例沒有定義__getattr__(),則丟出AttributeError

以上的流程可以作個簡單的驗證:
>>> class Desc:
...     def __get__(self, instance, owner):
...         print('instance', instance, 'owner', owner)
...     def __set__(self, instance, value):
...         print('instance', instance, 'value', value)
...
>>> class X:
...     x = Desc()
...
>>> x = X()
>>> x.x
instance <__main__.X object at 0x01E01C10> owner <class '__main__.X'>
>>> x.x = 10
instance <__main__.X object at 0x01E01C10> value 10
>>> x.__dict__['x'] = 10
>>> x.x
instance <__main__.X object at 0x01E01C10> owner <class '__main__.X'>
>>> x.__dict__['x']
10
>>> del x.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: __delete__
>>>

除了__get__()方法之外,還具有__set__()或__delete__()方法或兩者兼具的描述器,稱之為資料描述器(Data descriptor),其行為與僅有__get__()方法的非資料描述器(Non-data descriptor)不同。例如以下為非資料描述器的行為:


Data descriptor :___get__,__set__(),__delete__()
Non-data descriptor:__get__
>>> class Desc:
...     def __get__(self, instance, owner):
...         print('instance', instance, 'owner', owner)
...
>>> class X:
...     x = Desc()
...
>>> x = X()
>>> x.x
instance <__main__.X object at 0x01E01FD0> owner <class '__main__.X'>
>>> x.x = 10
>>> x.x
10
>>> del x.x
>>> x.x
instance <__main__.X object at 0x01E01FD0> owner <class '__main__.X'>
>>>

簡而言之,資料描述器可以讓你攔截對實例作特性的取得、設定與刪除行為,而非資料描述器可以讓你在攔截透過實例取得類別特性時的行為。

回顧 property() 函式 的內容,對於實例作特性的取得、設定與刪除,都會被轉呼叫為所指定的函式,可想而知的,這是一種資料描述器的行為,若要自行實作property()函式的行為,則可以如下:
def prop(getter, setter, deleter):
    class PropDesc:
        def __get__(self, instance, owner):
            return getter(instance)
        def __set__(self, instance, value):
            setter(instance, value)
        def __delete__(self, instance):
            deleter(instance)
            
    return PropDesc()

如此,property() 函式 中使用property()函式的例子,就可以改用以上的prop()函式
class Ball:
    def __init__(self, radius):
        if radius <= 0:
            raise ValueError('必須是正數')
        self.__radius = radius
    
    def getRadius(self):
        return self.__radius
        
    def setRadius(self, radius):
        self.__radius = radius
        
    def delRadius(self):
        del self.__radius
        
    radius = prop(getRadius, setRadius, delRadius)

 靜態方法、類別方法 中討論過,類別的實例在操作類別所定義的方法時,方法的第一個參數都會被綁定為實例,透過實例所操作的這些方法稱之為綁定方法(Bound method)。例如:

>>> class Some:
...     def doSome(self):
...         print('something...', self)
...
>>> s = Some()
>>> s.doSome()
something... <__main__.Some object at 0x01DA1C50>
>>> s.doSome
<bound method Some.doSome of <__main__.Some object at 0x01DA1C50>>
>>> Some.doSome('arguments')
something... arguments
>>> Some.doSome
<function doSome at 0x01D303D8>
>>>


很顯然地,透過實例所操作的方法,與原先定義在Some類別上的函式是不同的。事實上,你可以這麼操作:

>>> Some.__dict__['doSome'].__get__(s, Some)()something... <__main__.Some object at 0x01DA1C50>
>>> Some.__dict__['doSome'].__get__(s, Some)
<bound method Some.doSome of <__main__.Some object at 0x01DA1C50>>
>>> Some.__dict__['doSome'].__get__(None, Some)()

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: doSome() takes exactly 1 positional argument (0 given)
>>> Some.__dict__['doSome'].__get__(None, Some)('arguments')
something... arguments
>>> Some.__dict__['doSome'].__get__(None, Some)
<function doSome at 0x01D303D8>
>>>


顯然地,Some類別上的doSome特性所參考的物件,具有__get__()方法,也就是說doSome特性實際上是個描述器,在Python類別中定義的函式,實際上是個特性名稱參考至一個非資料描述器。

假設你有個類別如下:

class Some:
    def doSome(self, arg):
        print(self, arg)

s = Some()
s.doSome(10)
Some.doSome(10, 20)

可以嘗試自行使用描述器來「模擬」上面的Some類別doSome的行為,以
大致可以了解Python中對於綁定方法的原理
class DoSomeDesc:
    def doSome(self, arg):
        print(self, arg)
        
    def __get__(self, instance, owner):
        if instance:
            return lambda arg: DoSomeDesc.doSome(instance, arg)
        else:
            return DoSomeDesc.doSome       
    
class Some:
    doSome = DoSomeDesc()
    
s = Some()
s.doSome(10)
Some.doSome(10, 20)






python vars()函數會呼叫會代為呼叫實例的__dict__

>>> class Some:
...     def __init__(self):
...         self.x = 10
...         self.y = 20
...
>>> s = Some()
>>> vars(s)
{'y': 20, 'x': 10}
>>>


用vars()可觀看s的屬性,回傳dict型别物件

Python可以動態地為類別添加屬性

由於Python可以動態地為類別添加屬性即使是未添加屬性前就已建立的物件,在類別動態添加屬性之後, 也可依Python的名稱空間搜尋順序套用上新的屬性,用這種方式,您可以為類別動態地添加方法。例如:
class Some:
    def __init__(self, x):
        self.x = x
        
s = Some(1)
Some.service = lambda self, y: print('do service...', self.x + y)
s.service(2)    # do service... 3

2012年4月9日 星期一

Python 科學計算

http://scipy-lectures.github.com/intro/numpy/numpy.html


1.3. NumPy: creating and manipulating numerical data

authors:Emmanuelle Gouillart, Didrik Pinte, Gaël Varoquaux, and Pauli Virtanen

1.3.1. Intro

1.3.1.1. What is Numpy

Python has:
  • built-in: lists, integers, floating point
  • for numerics — more is needed (efficiency, convenience)
Numpy is:
  • extension package to Python for multidimensional arrays
  • closer to hardware (efficiency)
  • designed for scientific computation (convenience)
For example:
An array containing —
  • discretized time of an experiment/simulation
  • signal recorded by a measurement device
  • pixels of an image
  • ...

1.3.2. 1. Basics I

1.3.2.1. Getting started

>>> import numpy as np
>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])

1.3.2.2. Reference documentation

  • Interactive help:
    >>> help(np.array)
    array(object, dtype=None, copy=True, order=None, subok=False, ndmin=0)
    
    Create an array.
    
    Parameters
    ----------
    object : array_like
    ...
    
    Examples
    --------
    >>> np.array([1, 2, 3])
    array([1, 2, 3])
    ...
    
  • Looking for something:
    >>> np.lookfor('create array')
    Search results for 'create array'
    ---------------------------------
    numpy.array
        Create an array.
    numpy.memmap
        Create a memory-map to an array stored in a *binary* file on disk.
    ...
    
    >>> help(np.lookfor)
    ...
    

1.3.2.3. Creating arrays

1-D
>>> a = np.array([0, 1, 2, 3])
>>> a
array([0, 1, 2, 3])
>>> a.ndim
1
>>> a.shape
(4,)
>>> len(a)
4
2-D, 3-D, ...
>>> b = np.array([[0, 1, 2], [3, 4, 5]])    # 2 x 3 array
>>> b
array([[ 0,  1,  2],
       [ 3,  4,  5]])
>>> b.ndim
2
>>> b.shape
(2, 3)
>>> len(b)     # returns the size of the first dimension
2

>>> c = np.array([[[1], [2]], [[3], [4]]])
>>> c
array([[[1],
        [2]],

       [[3],
        [4]]])
>>> c.shape
(2, 2, 1)
In practice, we rarely enter items one by one...
  • Evenly spaced:
    >>> import numpy as np
    >>> a = np.arange(10) # 0 .. n-1  (!)
    >>> a
    array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
    >>> b = np.arange(1, 9, 2) # start, end (exlusive), step
    >>> b
    array([1, 3, 5, 7])
    
    or by number of points:
    >>> c = np.linspace(0, 1, 6)   # start, end, num-points
    >>> c
    array([ 0. ,  0.2,  0.4,  0.6,  0.8,  1. ])
    >>> d = np.linspace(0, 1, 5, endpoint=False)
    >>> d
    array([ 0. ,  0.2,  0.4,  0.6,  0.8])
    
  • Common arrays:
    >>> a = np.ones((3, 3))  # reminder: (3, 3) is a tuple
    >>> a
    array([[ 1.,  1.,  1.],
           [ 1.,  1.,  1.],
           [ 1.,  1.,  1.]])
    >>> b = np.zeros((2, 2))
    >>> b
    array([[ 0.,  0.],
           [ 0.,  0.]])
    >>> c = np.eye(3)
    >>> c
    array([[ 1.,  0.,  0.],
           [ 0.,  1.,  0.],
           [ 0.,  0.,  1.]])
    >>> d = np.diag(np.array([1, 2, 3, 4, 5]))
    >>> d
    array([[1, 0, 0, 0, 0],
           [0, 2, 0, 0, 0],
           [0, 0, 3, 0, 0],
           [0, 0, 0, 4, 0],
           [0, 0, 0, 0, 5]])
    
  • Random numbers (Mersenne Twister PRNG):
    >>> a = np.random.rand(4)              # uniform in [0, 1]
    >>> a
    array([ 0.58597729,  0.86110455,  0.9401114 ,  0.54264348])
    
    >>> b = np.random.randn(4)             # gaussian
    >>> b
    array([-2.56844807,  0.06798064, -0.36823781,  0.86966886])
    
    >>> c = np.random.rand(3, 3)
    >>> c
    array([[ 0.31976645,  0.64807526,  0.74770801],
           [ 0.8280203 ,  0.8669403 ,  0.07663683],
           [ 0.11527489,  0.11494884,  0.13503285]])
    
    >>> d = np.random.zipf(1.5, size=(2, 8))  # Zipf distribution (s=1.5)
    >>> d
    array([[5290,    1,    6,    9,    1,    1,    1,    2],
           [   1,    5,    1,   13,    1,    1,    2,    1]])
    
    >>> np.random.seed(1234)                  # Setting the random seed
    >>> np.random.rand(3)
    array([ 0.19151945,  0.62210877,  0.43772774])
    >>> np.random.seed(1234)
    >>> np.random.rand(5)
    array([ 0.19151945,  0.62210877,  0.43772774,  0.78535858,  0.77997581])
    

1.3.2.4. Basic data types

You probably noted the 1 and 1. above. These are different data types:
>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> b = np.array([1., 2., 3.])
>>> b.dtype
dtype('float64')
Much of the time you don’t necessarily need to care, but remember they are there.

You can also choose which one you want:
>>> c = np.array([1, 2, 3], dtype=float)
>>> c.dtype
dtype('float64')
The default data type is floating point:
>>> a = np.ones((3, 3))
>>> a.dtype
dtype('float64')
>>> b = np.linspace(0, 1, 6)
>>> b.dtype
dtype('float64')
There are also other types:
>>> d = np.array([1+2j, 3+4j, 5+6*1j])
>>> d.dtype
dtype('complex128')
>>> e = np.array([True, False, False, True])
>>> e.dtype
dtype('bool')
>>> f = np.array(['Bonjour', 'Hello', 'Hallo', 'Terve', 'Hej'])
>>> f.dtype
dtype('S7')         # <--- strings containing max. 7 letters

1.3.2.5. Basic visualization

Now that we have our first data arrays, we are going to visualize them.
Matplotlib is a 2D plotting package. We can import its functions as below:
>>> import matplotlib.pyplot as plt  # the tidy way
>>> # ... or ...
>>> from matplotlib.pyplot import *  # imports everything in the namespace
If you launched Ipython with python(x,y), or with ipython -pylab (under Linux), both of the above commands have been run. In the remainder of this tutorial, we assume you have run
>>> import matplotlib.pyplot as plt
or are using ipython -pylab which does it automatically.
1D plotting
>>> x = np.linspace(0, 3, 20)
>>> y = np.linspace(0, 9, 20)
>>> plt.plot(x, y)       # line plot
>>> plt.plot(x, y, 'o')  # dot plot
>>> plt.show()           # <-- shows the plot (not needed with Ipython)
../../_images/numpy-1.png
2D arrays (such as images)
>>> image = np.random.rand(30, 30)
>>> plt.imshow(image)
>>> plt.gray()
>>> plt.show()
../../_images/numpy-2_00_00.png
>>> plt.pcolor(image)
>>> plt.hot()
>>> plt.colorbar()
>>> plt.show()
../../_images/numpy-2_01_00.png
See also
 
More on matplotlib in the tutorial by Mike Müller tomorrow!
3D plotting
For 3D visualization, we can use another package: Mayavi. A quick example: start with relaunching iPython with these options: ipython -pylab -wthread (or ipython –pylab=wx in IPython >= 0.10).
In [59]: from enthought.mayavi import mlab
In [60]: mlab.figure()
get fences failed: -1
param: 6, val: 0
Out[60]: <enthought.mayavi.core.scene.Scene object at 0xcb2677c>
In [61]: mlab.surf(image)
Out[61]: <enthought.mayavi.modules.surface.Surface object at 0xd0862fc>
In [62]: mlab.axes()
Out[62]: <enthought.mayavi.modules.axes.Axes object at 0xd07892c>
../../_images/surf1.png
The mayavi/mlab window that opens is interactive : by clicking on the left mouse button you can rotate the image, zoom with the mouse wheel, etc.

1.3.2.6. Indexing and slicing

The items of an array can be accessed and assigned to the same way as other Python sequences (listtuple)
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[0], a[2], a[-1]
(0, 2, 9)
Warning
 
Indices begin at 0, like other Python sequences (and C/C++). In contrast, in Fortran or Matlab, indices begin at 1.
For multidimensional arrays, indexes are tuples of integers:
>>> a = np.diag(np.arange(5))
>>> a
array([[0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0],
       [0, 0, 2, 0, 0],
       [0, 0, 0, 3, 0],
       [0, 0, 0, 0, 4]])
>>> a[1,1]
1
>>> a[2,1] = 10 # third line, second column
>>> a
array([[ 0,  0,  0,  0,  0],
       [ 0,  1,  0,  0,  0],
       [ 0, 10,  2,  0,  0],
       [ 0,  0,  0,  3,  0],
       [ 0,  0,  0,  0,  4]])
>>> a[1]
array([0, 1, 0, 0, 0])
Note that:
  • In 2D, the first dimension corresponds to rows, the second to columns.
  • for multidimensional a,`a[0]` is interpreted by taking all elements in the unspecified dimensions.
Slicing
Arrays, like other Python sequences can also be sliced:
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> a[2:9:3] # [start:end:step]
array([2, 5, 8])
Note that the last index is not included!:
>>> a[:4]
array([0, 1, 2, 3])
start:end:step is a slice object which represents the set of indexes range(start, end, step). A slice can be explicitly created:
>>> sl = slice(1, 9, 2)
>>> a = np.arange(10)
>>> b = np.arange(1, 20, 2)
>>> a, b
(array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19]))
>>> a[sl], b[sl]
(array([1, 3, 5, 7]), array([ 3,  7, 11, 15]))
All three slice components are not required: by default, start is 0, end is the last and step is 1:
>>> a[1:3]
array([1, 2])
>>> a[::2]
array([0, 2, 4, 6, 8])
>>> a[3:]
array([3, 4, 5, 6, 7, 8, 9])
Of course, it works with multidimensional arrays:
>>> a = np.eye(5)
>>> a
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])
>>> a[2:4,:3] # 3rd and 4th rows, 3 first columns
array([[ 0.,  0.,  1.],
       [ 0.,  0.,  0.]])
All elements specified by a slice can be easily modified:
>>> a[:3,:3] = 4
>>> a
array([[ 4.,  4.,  4.,  0.,  0.],
       [ 4.,  4.,  4.,  0.,  0.],
       [ 4.,  4.,  4.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])
A small illustrated summary of Numpy indexing and slicing...
../../_images/numpy_indexing.png

1.3.2.7. Copies and views

A slicing operation creates a view on the original array, which is just a way of accessing array data. Thus the original array is not copied in memory.
When modifying the view, the original array is modified as well:
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b = a[::2]; b
array([0, 2, 4, 6, 8])
>>> b[0] = 12
>>> b
array([12,  2,  4,  6,  8])
>>> a   # (!)
array([12,  1,  2,  3,  4,  5,  6,  7,  8,  9])

>>> a = np.arange(10)
>>> b = a[::2].copy()  # force a copy
>>> b[0] = 12
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
This behavior can be surprising at first sight... but it allows to save both memory and time.

1.3.2.8. Data files

Text files
Example: populations.txt:
1900        30e3    4e3     51300
1901        47.2e3  6.1e3   48200
1902        70.2e3  9.8e3   41500
...
>>> data = np.loadtxt('populations.txt')    # if in current directory
>>> data
array([[  1900.,  30000.,   4000.,  51300.],
       [  1901.,  47200.,   6100.,  48200.],
       [  1902.,  70200.,   9800.,  41500.],
...
>>> np.savetxt('pop2.txt', data)
>>> data2 = np.loadtxt('pop2.txt')
Note
 
If you have a complicated text file, what you can try are:
  • np.genfromtxt
  • Using Python’s I/O functions and e.g. regexps for parsing (Python is quite well suited for this)
Navigating the filesystem in Python shells
Ipython
In [1]: pwd      # show current directory
'/home/user/stuff/2011-numpy-tutorial'
In [2]: cd ex
'/home/user/stuff/2011-numpy-tutorial/ex'
In [3]: ls
populations.txt  species.txt
Python (here’s yet one reason to use Ipython for interactive use :)
>>> import os
>>> os.getcwd()
'/home/user/stuff/2011-numpy-tutorial'
>>> os.chdir('ex')
>>> os.getcwd()
'/home/user/stuff/2011-numpy-tutorial/ex'
>>> os.listdir('.')
['populations.txt',
 'species.txt',
 ...
Images
>>> img = plt.imread('../../data/elephant.png')
>>> img.shape, img.dtype
((200, 300, 3), dtype('float32'))
>>> plt.imshow(img)
>>> plt.savefig('plot.png')
>>> plt.show()
../../_images/numpy-3_00_00.png
>>> plt.imsave('red_elephant', img[:,:,0], cmap=plt.cm.gray)
This saved only one channel (of RGB)
>>> plt.imshow(plt.imread('red_elephant.png'))
>>> plt.show()
../../_images/numpy-3_01_00.png
Other libraries:
>>> from scipy.misc import imsave
>>> imsave('tiny_elephant.png', img[::6,::6])
>>> plt.imshow(plt.imread('tiny_elephant.png'), interpolation='nearest')
>>> plt.show()
../../_images/numpy-3_02_00.png
Numpy’s own format
>>> np.save('pop.npy', data)
>>> data3 = np.load('pop.npy')
Well-known (& more obscure) file formats
  • HDF5: h5pyPyTables
  • NetCDF: scipy.io.netcdf_filenetcdf4-python, ...
  • Matlab: scipy.io.loadmatscipy.io.savemat
  • MatrixMarket: scipy.io.mmreadscipy.io.mmread
... if somebody uses it, there’s probably also a Python library for it.

1.3.2.9. Summary & Exercises

  • Creating arrays: arraylinspacearangezerosonesrand
  • Data types: integers, floats, complex floats, and strings
  • Simple plotting with Matplotlib: plot(x, y)
  • Indexing, slicing, and assignment into arrays — slicing creates views
  • Reading data from files: loadtxtsavetxt, et al.

Worked example: Prime number sieve
../../_images/prime-sieve.png
Compute prime numbers in 0–99, with a sieve
  • Construct a shape (100,) boolean array is_prime, filled with True in the beginning:
    >>> is_prime = np.ones((100,), dtype=bool)
    
  • Cross out 0 and 1 which are not primes
    >>> is_prime[:2] = 0
    
  • For each integer j starting from 2, cross out its higher multiples
    >>> N_max = int(np.sqrt(len(is_prime)))
    >>> for j in range(2, N_max):
    ...     is_prime[2*j::j] = False
    
  • Skim through help(np.nonzero), and print the prime numbers
  • Follow-up:
    • Move the above code into a script file named prime_sieve.py
    • Run it to check it works
    • Convert the simple sieve to the sieve of Eratosthenes:
      1. Skip j which are already known to not be primes
      2. The first number to cross out is j^2
Reminder – Python scripts:
A very basic script file contains:
import numpy as np
import matplotlib.pyplot as plt  # <- if you use it

# any numpy-using commands follow

a = np.array([1, 2, 3, 4])
print a
Run on command line:
python my_script_1_2.py
Or, in Python shell:
In [10]: %run my_script_1_2.py

>>> execfile('2_2_data_statistics.py')

Exercise 1.1: Certain arrays
Create the following arrays (with correct data types):
[[ 1  1  1  1]
 [ 1  1  1  1]
 [ 1  1  1  2]
 [ 1  6  1  1]]

[[0. 0. 0. 0. 0.]
 [2. 0. 0. 0. 0.]
 [0. 3. 0. 0. 0.]
 [0. 0. 4. 0. 0.]
 [0. 0. 0. 5. 0.]
 [0. 0. 0. 0. 6.]]
Par on course: 3 statements for each (53 & 54 characters)

Exercise 1.2: Text data files
Write a Python script that loads data from populations.txt and drop the last column and the first 5 rows. Save the smaller dataset to pop2.txt.

Exercise 1.3: Tiling
Skim through the documentation for np.tile, and use this function to construct the array:
[[4 3 4 3 4 3]
 [2 1 2 1 2 1]
 [4 3 4 3 4 3]
 [2 1 2 1 2 1]]

1.3.3. 2. Basics II

1.3.3.1. Elementwise operations

With scalars:
>>> a = np.array([1, 2, 3, 4])
>>> a + 1
array([2, 3, 4, 5])
>>> 2**a
array([ 2,  4,  8, 16])
All arithmetic operates elementwise:
>>> b = np.ones(4) + 1
>>> a - b
array([-1.,  0.,  1.,  2.])
>>> a * b
array([ 2.,  4.,  6.,  8.])
>>> j = np.arange(5)
>>> 2**(j + 1) - j
array([ 2,  3,  6, 13, 28])
Warning
 
... including multiplication
>>> c = np.ones((3, 3))
>>> c * c                   # NOT matrix multiplication!
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
Note
 
Matrix multiplication:
>>> c.dot(c)
array([[ 3.,  3.,  3.],
       [ 3.,  3.,  3.],
       [ 3.,  3.,  3.]])
Comparisons:
>>> a = np.array([1, 2, 3, 4])
>>> b = np.array([4, 2, 2, 4])
>>> a == b
array([False,  True, False,  True], dtype=bool)
>>> a > b
array([False, False,  True, False], dtype=bool)
Logical operations:
>>> a = np.array([1, 1, 0, 0], dtype=bool)
>>> b = np.array([1, 0, 1, 0], dtype=bool)
>>> a | b
array([ True,  True,  True, False], dtype=bool)
>>> a & b
array([ True, False, False, False], dtype=bool)
Note
 
For arrays: “&” and “|” for logical operations, not “and” and “or“.
Shape mismatches:
>>> a
array([1, 2, 3, 4])
>>> a + np.array([1, 2])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: shape mismatch: objects cannot be broadcast to a single shape
‘Broadcast’? We’ll return to that later.

1.3.3.2. Basic linear algebra

Matrix multiplication:
>>> a = np.triu(np.ones((3, 3)), 1)   # see help(np.triu)
>>> a
array([[ 0.,  1.,  1.],
       [ 0.,  0.,  1.],
       [ 0.,  0.,  0.]])
>>> b = np.diag([1, 2, 3])
>>> a.dot(b)
array([[ 0.,  2.,  3.],
       [ 0.,  0.,  3.],
       [ 0.,  0.,  0.]])
>>> np.dot(a, a)
array([[0, 0, 1],
       [0, 0, 0],
       [0, 0, 0]])
Transpose:
>>> a.T
array([[ 0.,  0.,  0.],
       [ 1.,  0.,  0.],
       [ 1.,  1.,  0.]])
Inverses and linear equation systems:
>>> A = a + b
>>> A
array([[ 1.,  1.,  1.],
       [ 0.,  2.,  1.],
       [ 0.,  0.,  3.]])
>>> B = np.linalg.inv(A)
>>> B.dot(A)
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> x = np.linalg.solve(A, [1, 2, 3])
>>> x
array([-0.5,  0.5,  1. ])
>>> A.dot(x)
array([ 1.,  2.,  3.])
Eigenvalues:
>>> np.linalg.eigvals(A)
array([ 1.,  2.,  3.])
... and so on, see help(np.linalg)

1.3.3.3. Basic reductions

Computing sums:
>>> x = np.array([1, 2, 3, 4])
>>> np.sum(x)
10
>>> x.sum()
10
Sum by rows and by columns:
../../_images/reductions.png
>>> x = np.array([[1, 1], [2, 2]])
>>> x
array([[1, 1],
       [2, 2]])
>>> x.sum(axis=0)   # columns (first dimension)
array([3, 3])
>>> x[:,0].sum(), x[:,1].sum()
(3, 3)
>>> x.sum(axis=1)   # rows (second dimension)
array([2, 4])
>>> x[0,:].sum(), x[1,:].sum()
(2, 4)
Same idea in higher dimensions:
>>> x = np.random.rand(2, 2, 2)
>>> x.sum(axis=2)[0,1]
1.1600112273698793
>>> x[0,1,:].sum()
1.1600112273698793
Other reductions — works the same way (and take axis=)
  • Statistics:
    >>> x = np.array([1, 2, 3, 1])
    >>> y = np.array([[1, 2, 3], [5, 6, 1]])
    >>> x.mean()
    1.75
    >>> np.median(x)
    1.5
    >>> np.median(y, axis=-1) # last axis
    array([ 2.,  5.])
    
    >>> x.std()          # full population standard dev.
    0.82915619758884995
    >>> x.std(ddof=1)    # sample std (with N-1 in divisor)
    0.9574271077563381
    
  • Extrema:
    >>> x = np.array([1, 3, 2])
    >>> x.min()
    1
    >>> x.max()
    3
    
    >>> x.argmin()  # index of minimum
    0
    >>> x.argmax()  # index of maximum
    1
    
  • Logical operations:
    >>> np.all([True, True, False])
    False
    >>> np.any([True, True, False])
    True
    
    Note
     
    Can be used for array comparisons:
    >>> a = np.zeros((100, 100))
    >>> np.any(a != 0)
    False
    >>> np.all(a == a)
    True
    
    >>> a = np.array([1, 2, 3, 2])
    >>> b = np.array([2, 2, 3, 2])
    >>> c = np.array([6, 4, 4, 5])
    >>> ((a <= b) & (b <= c)).all()
    True
    
  • ... and many more (best to learn as you go).
Example: data statistics
Data in populations.txt describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years.
We can first plot the data:
>>> data = np.loadtxt('../../data/populations.txt')
>>> year, hares, lynxes, carrots = data.T  # trick: columns to variables
>>> plt.axes([0.2, 0.1, 0.5, 0.8])
>>> plt.plot(year, hares, year, lynxes, year, carrots)
>>> plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
>>> plt.show()
../../_images/numpy-4_00_00.png
The mean populations over time:
>>> populations = data[:,1:]
>>> populations.mean(axis=0)
array([ 34080.95238095,  20166.66666667,  42400.        ])
The sample standard deviations:
>>> populations.std(axis=0, ddof=1)
array([ 21413.98185877,  16655.99991995,   3404.55577132])
Which species has the highest population each year?
>>> np.argmax(populations, axis=1)
array([2, 2, 0, 0, 1, 1, 2, 2, 2, 2, 2, 2, 0, 0, 0, 1, 2, 2, 2, 2, 2])
Example: diffusion simulation using a random walk algorithm
../../_images/random_walk.png
What is the typical distance from the origin of a random walker after t left or right jumps?
../../_images/random_walk_schema.png
>>> n_stories = 1000 # number of walkers
>>> t_max = 200      # time during which we follow the walker
We randomly choose all the steps 1 or -1 of the walk
>>> t = np.arange(t_max)
>>> steps = 2 * np.random.random_integers(0, 1, (n_stories, t_max)) - 1
>>> np.unique(steps) # Verification: all steps are 1 or -1
array([-1,  1])
We build the walks by summing steps along the time
>>> positions = np.cumsum(steps, axis=1) # axis = 1: dimension of time
>>> sq_distance = positions**2
We get the mean in the axis of the stories
>>> mean_sq_distance = np.mean(sq_distance, axis=0)
Plot the results:
>>> plt.figure(figsize=(4, 3))
>>> plt.plot(t, np.sqrt(mean_sq_distance), 'g.', t, np.sqrt(t), 'y-')
>>> plt.xlabel(r"$t$")
>>> plt.ylabel(r"$\sqrt{\langle (\delta x)^2 \rangle}$")
>>> plt.show()
../../_images/numpy-5.png
The RMS distance grows as the square root of the time!

1.3.3.4. Broadcasting

  • Basic operations on numpy arrays (addition, etc.) are elementwise
  • This works on arrays of the same size.
    Nevertheless, It’s also possible to do operations on arrays of different
    sizes if Numpy can transform these arrays so that they all have
    the same size: this conversion is called broadcasting.
The image below gives an example of broadcasting:
../../_images/numpy_broadcasting.png
Let’s verify:
>>> a = np.tile(np.arange(0, 40, 10), (3, 1)).T
>>> a
array([[ 0,  0,  0],
       [10, 10, 10],
       [20, 20, 20],
       [30, 30, 30]])
>>> b = np.array([0, 1, 2])
>>> a + b
array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])
An useful trick:
>>> a = np.arange(0, 40, 10)
>>> a.shape
(4,)
>>> a = a[:,np.newaxis]         # adds a new axis -> 2D array
>>> a.shape
(4, 1)
>>> a
array([[ 0],
       [10],
       [20],
       [30]])
>>> a + b
array([[ 0,  1,  2],
       [10, 11, 12],
       [20, 21, 22],
       [30, 31, 32]])
We have already used broadcasting without knowing it!:
>>> a = np.ones((4,5))
>>> a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1
array([[ 2.,  2.,  2.,  2.,  2.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
Broadcasting seems a bit magical, but it is actually quite natural to use it when we want to solve a problem whose output data is an array with more dimensions than input data.
Example
Let’s construct an array of distances (in miles) between cities of Route 66: Chicago, Springfield, Saint-Louis, Tulsa, Oklahoma City, Amarillo, Santa Fe, Albuquerque, Flagstaff and Los Angeles.
>>> mileposts = np.array([0, 198, 303, 736, 871, 1175, 1475, 1544,
...        1913, 2448])
>>> distance_array = np.abs(mileposts - mileposts[:,np.newaxis])
>>> distance_array
array([[   0,  198,  303,  736,  871, 1175, 1475, 1544, 1913, 2448],
       [ 198,    0,  105,  538,  673,  977, 1277, 1346, 1715, 2250],
       [ 303,  105,    0,  433,  568,  872, 1172, 1241, 1610, 2145],
       [ 736,  538,  433,    0,  135,  439,  739,  808, 1177, 1712],
       [ 871,  673,  568,  135,    0,  304,  604,  673, 1042, 1577],
       [1175,  977,  872,  439,  304,    0,  300,  369,  738, 1273],
       [1475, 1277, 1172,  739,  604,  300,    0,   69,  438,  973],
       [1544, 1346, 1241,  808,  673,  369,   69,    0,  369,  904],
       [1913, 1715, 1610, 1177, 1042,  738,  438,  369,    0,  535],
       [2448, 2250, 2145, 1712, 1577, 1273,  973,  904,  535,    0]])
../../_images/route66.png
Good practices
  • Explicit variable names (no need of a comment to explain what is in the variable)
  • Style: spaces after commas, around =, etc.
    A certain number of rules for writing “beautiful” code (and, more importantly, using the same conventions as everybody else!) are given in the Style Guide for Python Code and theDocstring Conventions page (to manage help strings).
  • Except some rare cases, variable names and comments in English.
A lot of grid-based or network-based problems can also use broadcasting. For instance, if we want to compute the distance from the origin of points on a 10x10 grid, we can do:
>>> x, y = np.arange(5), np.arange(5)
>>> distance = np.sqrt(x**2 + y[:, np.newaxis]**2)
>>> distance
array([[ 0.        ,  1.        ,  2.        ,  3.        ,  4.        ],
       [ 1.        ,  1.41421356,  2.23606798,  3.16227766,  4.12310563],
       [ 2.        ,  2.23606798,  2.82842712,  3.60555128,  4.47213595],
       [ 3.        ,  3.16227766,  3.60555128,  4.24264069,  5.        ],
       [ 4.        ,  4.12310563,  4.47213595,  5.        ,  5.65685425]])
Or in color:
>>> plt.pcolor(distance)
>>> plt.colorbar()
>>> plt.axis('equal')
>>> plt.show()            # <-- again, not needed in interactive Python
../../_images/numpy-6.png
Remark : the numpy.ogrid function allows to directly create vectors x and y of the previous example, with two “significant dimensions”:
>>> x, y = np.ogrid[0:5, 0:5]
>>> x, y
(array([[0],
       [1],
       [2],
       [3],
       [4]]), array([[0, 1, 2, 3, 4]]))
>>> x.shape, y.shape
((5, 1), (1, 5))
>>> distance = np.sqrt(x**2 + y**2)
So, np.ogrid is very useful as soon as we have to handle computations on a grid. On the other hand, np.mgrid directly provides matrices full of indices for cases where we can’t (or don’t want to) benefit from broadcasting:
>>> x, y = np.mgrid[0:4, 0:4]
>>> x
array([[0, 0, 0, 0],
       [1, 1, 1, 1],
       [2, 2, 2, 2],
       [3, 3, 3, 3]])
>>> y
array([[0, 1, 2, 3],
       [0, 1, 2, 3],
       [0, 1, 2, 3],
       [0, 1, 2, 3]])
However, in practice, this is rarely needed!

1.3.3.5. Array shape manipulation

Flattening
>>> a = np.array([[1, 2, 3], [4, 5, 6]])
>>> a.ravel()
array([1, 2, 3, 4, 5, 6])
>>> a.T
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> a.T.ravel()
array([1, 4, 2, 5, 3, 6])
Higher dimensions: last dimensions ravel out “first”.
Reshaping
The inverse operation to flattening:
>>> a.shape
(2, 3)
>>> b = a.ravel()
>>> b.reshape((2, 3))
array([[1, 2, 3],
       [4, 5, 6]])
Creating an array with a different shape, from another array:
>>> a = np.arange(36)
>>> b = a.reshape((6, 6))
>>> b
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
Or,
>>> b = a.reshape((6, -1))    # unspecified (-1) value is inferred
Copies or views
ndarray.reshape may return a view (cf help(np.reshape))), not a copy:
>>> b[0,0] = 99
>>> a
array([99,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35])
Beware!
>>> a = np.zeros((3,2))
>>> b = a.T.reshape(3*2)
>>> b[0] = 9
>>> a
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])
To understand, see “Under the hood” below.
Dimension shuffling
>>> a = np.arange(4*3*2).reshape(4, 3, 2)
>>> a.shape
(4, 3, 2)
>>> a[0,2,1]
5
>>> b = a.transpose(1, 2, 0)
>>> b.shape
(3, 2, 4)
>>> b[2,1,0]
5
Also creates a view:
>>> b[2,1,0] = -1
>>> a[0,2,1]
-1
Resizing
Size of an array can be changed with ndarray.resize:
>>> a = np.arange(4)
>>> a.resize((8,))
>>> a
array([0, 1, 2, 3, 0, 0, 0, 0])
However, it must not be referred to somewhere else:
>>> b = a
>>> a.resize((4,))
...
ValueError: cannot resize an array references or is referenced
by another array in this way.  Use the resize function
Some examples of real-world use cases
Case 2.a: Calling (legacy) Fortran code
Shape-preserving functions with elementwise non-Python routines. For instance, Fortran
! 2_a_fortran_module.f90
subroutine some_function(n, a, b)
  integer :: n
  double precision, dimension(n), intent(in) :: a
  double precision, dimension(n), intent(out) :: b
  b = a + 1
end subroutine some_function
f2py -c -m fortran_module 2_a_fortran_module.f90
import numpy as np
import fortran_module

def some_function(input):
    """
    Call a Fortran routine, and preserve input shape
    """
    input = np.asarray(input)
    # fortran_module.some_function() takes 1-D arrays!
    output = fortran_module.some_function(input.ravel())
    return output.reshape(input.shape)

print some_function(np.array([1, 2, 3]))
print some_function(np.array([[1, 2], [3, 4]]))

# ->
# [ 2.  3.  4.]
# [[ 2.  3.]
#  [ 4.  5.]]
Case 2.b: Block matrices and vectors (and tensors)
Vector space: quantum level \otimes spin
\check{\psi}
=
\begin{pmatrix}
\hat{\psi}_1 \\ \hat{\psi}_2
\end{pmatrix}
\,,
\qquad
\hat{\psi}_{1} =
\begin{pmatrix}
  \psi_{1\uparrow} \\ \psi_{1\downarrow}
\end{pmatrix}
\qquad
\hat{\psi}_{2} =
\begin{pmatrix}
  \psi_{2\uparrow} \\ \psi_{2\downarrow}
\end{pmatrix}
In short: for block matrices and vectors, it can be useful to preserve the block structure.
In Numpy:
>>> psi = np.zeros((2, 2))   # dimensions: level, spin
>>> psi[0,1] # <-- psi_{1,downarrow}
Linear operators on such block vectors have similar block structure:
\check{H} = \begin{pmatrix}
\hat{h}_{11} & \hat{V} \\
\hat{V}^\dagger & \hat{h}_{22} \\
\end{pmatrix}
\,,
\qquad
\hat{h}_{11}
=
\begin{pmatrix}
\epsilon_{1,\uparrow}
& 0
\\
0 & \epsilon_{1,\downarrow}
\end{pmatrix}
\,,
\qquad
\ldots
>>> H = np.zeros((2, 2, 2, 2)) # dimensions: level1, level2, spin1, spin2
>>> h_11 = H[0,0,:,:]
>>> V = H[0,1]
Doing the matrix product: get rid of the block structure, do the 4x4 matrix product, then put it back
\check{H}\check{\psi}
>>> def mdot(operator, psi):
...     return operator.transpose(0, 2, 1, 3).reshape(4, 4).dot(
...                psi.reshape(4)).reshape(2, 2)
I.e., reorder dimensions first to level1, spin1, level2, spin2 and then reshape => correct matrix product.

1.3.3.6. Fancy indexing

Numpy arrays can be indexed with slices, but also with boolean or integer arrays (masks). This method is called fancy indexing.
Masks
>>> np.random.seed(3)
>>> a = np.random.random_integers(0, 20, 15)
>>> a
array([10,  3,  8,  0, 19, 10, 11,  9, 10,  6,  0, 20, 12,  7, 14])
>>> (a % 3 == 0)
array([False,  True, False,  True, False, False, False,  True, False,
        True,  True, False,  True, False, False], dtype=bool)
>>> mask = (a % 3 == 0)
>>> extract_from_a = a[mask] # or,  a[a%3==0]
>>> extract_from_a           # extract a sub-array with the mask
array([ 3,  0,  9,  6,  0, 12])
Extracting a sub-array using a mask produces a copy of this sub-array, not a view like slicing:
>>> extract_from_a[0] = -1
>>> a
array([10,  3,  8,  0, 19, 10, 11,  9, 10,  6,  0, 20, 12,  7, 14])
Indexing with a mask can be very useful to assign a new value to a sub-array:
>>> a[a % 3 == 0] = -1
>>> a
array([10, -1,  8, -1, 19, 10, 11, -1, 10, -1, -1, 20, -1,  7, 14])
Indexing with an array of integers
>>> a = np.arange(10)
>>> a[::2] += 3 # to avoid having always the same np.arange(10)...
>>> a
array([ 3,  1,  5,  3,  7,  5,  9,  7, 11,  9])
>>> a[[2, 5, 1, 8]] # or, a[np.array([2, 5, 1, 8])]
array([ 5,  5,  1, 11])
Indexing can be done with an array of integers, where the same index is repeated several time:
>>> a[[2, 3, 2, 4, 2]]  # note: [2, 3, 2, 4, 2] is a Python list
array([5, 3, 5, 7, 5])
New values can be assigned with this kind of indexing:
>>> a[[9, 7]] = -10
>>> a
array([  3,   1,   5,   3,   7,   5,   9, -10,  11, -10])
>>> a[[2, 3, 2, 4, 2]] += 1
>>> a
array([  3,   1,   6,   4,   8,   5,   9, -10,  11, -10])
When a new array is created by indexing with an array of integers, the new array has the same shape than the array of integers:
>>> a = np.arange(10)
>>> idx = np.array([[3, 4], [9, 7]])
>>> a[idx]
array([[3, 4],
       [9, 7]])
>>> b = np.arange(10)

>>> a = np.arange(12).reshape(3, 4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array([0, 1, 1, 2])
>>> j = np.array([2, 1, 3, 3])
>>> a[i, j]
array([ 2,  5,  7, 11])

>>> i = np.array([[0, 1], [1, 2]])
>>> j = np.array([[2, 1], [3, 3]])
>>> i
array([[0, 1],
       [1, 2]])
>>> j
array([[2, 1],
       [3, 3]])
>>> a[i, j]
array([[ 2,  5],
       [ 7, 11]])
../../_images/numpy_fancy_indexing.png
We can even use fancy indexing and broadcasting at the same time:
>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])
>>> i = np.array([[0, 1], [1, 2]])
>>> a[i, 2] # same as a[i, 2*np.ones((2,2), dtype=int)]
array([[ 2,  6],
       [ 6, 10]])

1.3.3.7. Sorting data

Sorting along an axis:
>>> a = np.array([[4, 3, 5], [1, 2, 1]])
>>> b = np.sort(a, axis=1)
>>> b
array([[3, 4, 5],
       [1, 1, 2]])
Note
 
Sorts each row separately!
In-place sort:
>>> a.sort(axis=1)
>>> a
array([[3, 4, 5],
       [1, 1, 2]])
Sorting with fancy indexing:
>>> a = np.array([4, 3, 1, 2])
>>> j = np.argsort(a)
array([2, 3, 1, 0])
>>> a[j]
array([1, 2, 3, 4])
Finding minima and maxima:
>>> a = np.array([4, 3, 1, 2])
>>> j_max = np.argmax(a)
>>> j_min = np.argmin(a)
>>> j_max, j_min
(0, 2)

1.3.3.8. Summary & Exercises

  • Arithmetic etc. are elementwise operations
  • Basic linear algebra, .dot()
  • Reductions: sum(axis=1)std()all()any()
  • Broadcasting: a = np.arange(4); a[:,np.newaxis] + a[np.newaxis,:]
  • Shape manipulation: a.ravel()a.reshape(2, 2)
  • Fancy indexing: a[a > 3]a[[2, 3]]
  • Sorting data: .sort()np.sortnp.argsortnp.argmax

Worked example: Framing Lena
Let’s do some manipulations on numpy arrays by starting with the famous image of Lena (http://www.cs.cmu.edu/~chuck/lennapg/). scipy provides a 2D array of this image with thescipy.lena function:
>>> import scipy
>>> lena = scipy.lena()
Here are a few images we will be able to obtain with our manipulations: use different colormaps, crop the image, change some parts of the image.
../../_images/lenas.png
  • Let’s use the imshow function of pylab to display the image.
    In [3]: import pylab as plt
    In [4]: lena = scipy.lena()
    In [5]: plt.imshow(lena)
    
  • Lena is then displayed in false colors. A colormap must be specified for her to be displayed in grey.
    In [6]: plt.imshow(lena, plt.cm.gray)
    In [7]: # or,
    In [7]: plt.gray()
    
  • Create an array of the image with a narrower centering : for example, remove 30 pixels from all the borders of the image. To check the result, display this new array with imshow.
    In [9]: crop_lena = lena[30:-30,30:-30]
    
  • We will now frame Lena’s face with a black locket. For this, we need to
    • create a mask corresponding to the pixels we want to be black. The mask is defined by this condition (y-256)**2 + (x-256)**2
    In [15]: y, x = np.ogrid[0:512,0:512] # x and y indices of pixels
    In [16]: y.shape, x.shape
    Out[16]: ((512, 1), (1, 512))
    In [17]: centerx, centery = (256, 256) # center of the image
    In [18]: mask = ((y - centery)**2 + (x - centerx)**2) > 230**2 # circle
    
    then
    • assign the value 0 to the pixels of the image corresponding to the mask. The syntax is extremely simple and intuitive:
    In [19]: lena[mask] = 0
    In [20]: plt.imshow(lena)
    Out[20]: <matplotlib.image.AxesImage object at 0xa36534c>
    
  • Follow-up: copy all instructions of this exercise in a script called lena_locket.py then execute this script in IPython with %run lena_locket.py.
    Change the circle to an ellipsoid.

Exercise 2.1: Matrix manipulations
  1. Form the 2-D array (without typing it in explicitly):
    1  6 11
    2  7 12
    3  8 13
    4  9 14
    5 10 15
    and generate a new array containing its 2nd and 4th rows.
  2. Divide each column of the array
    >>> a = np.arange(25).reshape(5, 5)
    
    elementwise with the array b = np.array([1., 5, 10, 15, 20]). (Hint: np.newaxis).
  3. Harder one: Generate a 10 x 3 array of random numbers (in range [0,1]). For each row, pick the number closest to 0.5.
    • Use abs and argsort to find the column j closest for each row.
    • Use fancy indexing to extract the numbers. (Hint: a[i,j] – the array i must contain the row numbers corresponding to stuff in j.)

Exercise 2.2: Data statistics
The data in populations.txt describes the populations of hares and lynxes (and carrots) in northern Canada during 20 years:
>>> data = np.loadtxt('../../data/populations.txt')
>>> year, hares, lynxes, carrots = data.T  # trick: columns to variables
>>> plt.axes([0.2, 0.1, 0.5, 0.8])
>>> plt.plot(year, hares, year, lynxes, year, carrots)
>>> plt.legend(('Hare', 'Lynx', 'Carrot'), loc=(1.05, 0.5))
>>> plt.show()
../../_images/numpy-7.png
Computes and print, based on the data in populations.txt...
  1. The mean and std of the populations of each species for the years in the period.
  2. Which year each species had the largest population.
  3. Which species has the largest population for each year. (Hint: argsort & fancy indexing of np.array(['H', 'L', 'C']))
  4. Which years any of the populations is above 50000. (Hint: comparisons and np.any)
  5. The top 2 years for each species when they had the lowest populations. (Hint: argsort, fancy indexing)
  6. Compare (plot) the change in hare population (see help(np.gradient)) and the number of lynxes. Check correlation (see help(np.corrcoef)).
... all without for-loops.

Exercise 2.3: Crude integral approximations
Write a function f(a, b, c) that returns a^b - c. Form a 24x12x6 array containing its values in parameter ranges [0,1] x [0,1] x [0,1].
Approximate the 3-d integral
\int_0^1\int_0^1\int_0^1(a^b-c)da\,db\,dc
over this volume with the mean. The exact result is: \ln 2 -
\frac{1}{2}\approx0.1931\ldots — what is your relative error?
(Hints: use elementwise operations and broadcasting. You can make np.ogrid give a number of points in given range with np.ogrid[0:1:20j].)
Reminder – Python functions
def f(a, b, c):
    return some_result

Exercise 2.4: Mandelbrot set
../../_images/2_4_mandelbrot.png
Write a script that computes the Mandelbrot fractal. The Mandelbrot iteration:
N_max = 50
some_threshold = 50

c = x + 1j*y

for j in xrange(N_max):
    z = z**2 + c
Point (x, y) belongs to the Mandelbrot set if |c| < some_threshold.
Do this computation by:
  1. Construct a grid of c = x + 1j*y values in range [-2, 1] x [-1.5, 1.5]
  2. Do the iteration
  3. Form the 2-d boolean mask indicating which points are in the set
  4. Save the result to an image with:
    >>> import matplotlib.pyplot as plt
    >>> plt.imshow(mask.T, extent=[-2, 1, -1.5, 1.5])
    >>> plt.gray()
    >>> plt.savefig('mandelbrot.png')
    

Exercise 2.5: Markov chain
../../_images/markov-chain.png
Markov chain transition matrix P, and probability distribution on the states p:
  1. 0 <= P[i,j] <= 1: probability to go from state i to state j
  2. Transition rule: p_{new} = P^T p_{old}
  3. all(sum(P, axis=1) == 1)p.sum() == 1: normalization
Write a script that works with 5 states, and:
  • Constructs a random matrix, and normalizes each row so that it is a transition matrix.
  • Starts from a random (normalized) probability distribution p and takes 50 steps => p_50
  • Computes the stationary distribution: the eigenvector of P.T with eigenvalue 1 (numerically: closest to 1) => p_stationary
    Remember to normalize the eigenvector — I didn’t...
  • Checks if p_50 and p_stationary are equal to tolerance 1e-5
Toolbox: np.random.rand.dot()np.linalg.eig, reductions, abs()argmin, comparisons, allnp.linalg.norm, etc.

1.3.3.9. Conclusions

What do you need to know to get started?
  • Know how to create arrays : arrayarangeoneszeros.
  • Know the shape of the array with array.shape, then use slicing to obtain different views of the array: array[::2], etc. Adjust the shape of the array using reshape or flatten it with ravel.
  • Obtain a subset of the elements of an array and/or modify their values with masks:
    >>> a[a < 0] = 0
    
  • Know miscellaneous operations on arrays, such as finding the mean or max (array.max()array.mean()). No need to retain everything, but have the reflex to search in the documentation (online docs, help()lookfor())!!
  • For advanced use: master the indexing with arrays of integers, as well as broadcasting. Know more Numpy functions to handle various array operations.

1.3.4. 3. Moving on

1.3.4.1. More data types

Casting
“Bigger” type wins in mixed-type operations:
>>> np.array([1, 2, 3]) + 1.5
array([ 2.5,  3.5,  4.5])
Assignment never changes the type!
>>> a = np.array([1, 2, 3])
>>> a.dtype
dtype('int64')
>>> a[0] = 1.9     # <-- float is truncated to integer
>>> a
array([1, 2, 3])
Forced casts:
>>> a = np.array([1.7, 1.2, 1.6])
>>> b = a.astype(int)  # <-- truncates to integer
>>> b
array([1, 1, 1])
Rounding:
>>> a = np.array([1.7, 1.2, 1.6])
>>> b = np.around(a)
>>> b                    # still floating-point
array([ 2.,  1.,  2.])
>>> c = np.around(a).astype(int)
>>> c
array([2, 1, 2])
Different data type sizes
Integers (signed):
int88 bits
int1616 bits
int3232 bits (same as int on 32-bit platform)
int6464 bits (same as int on 64-bit platform)
>>> np.array([1], dtype=int).dtype
dtype('int64')
>>> np.iinfo(np.int32).max, 2**31 - 1
(2147483647, 2147483647)
>>> np.iinfo(np.int64).max, 2**63 - 1
(9223372036854775807, 9223372036854775807L)
Unsigned integers:
uint88 bits
uint1616 bits
uint3232 bits
uint6464 bits
>>> np.iinfo(np.uint32).max, 2**32 - 1
(2147483647, 2147483647)
>>> np.iinfo(np.uint64).max, 2**64 - 1
(9223372036854775807, 9223372036854775807L)
Floating-point numbers:
float1616 bits
float3232 bits
float6464 bits (same as float)
float9696 bits, platform-dependent (same as np.longdouble)
float128128 bits, platform-dependent (same asnp.longdouble)
>>> np.finfo(np.float32).eps
1.1920929e-07
>>>  np.finfo(np.float64).eps
2.2204460492503131e-16
>>> np.float32(1e-8) + np.float32(1) == 1
True
>>> np.float64(1e-8) + np.float64(1) == 1
False
Complex floating-point numbers:
complex64two 32-bit floats
complex128two 64-bit floats
complex192two 96-bit floats, platform-dependent
complex256two 128-bit floats, platform-dependent
Smaller data types
If you don’t know you need special data types, then you probably don’t.
Comparison on using float32 instead of float64:
  • Half the size in memory and on disk
  • Half the memory bandwidth required (may be a bit faster in some operations)
    In [1]: a = np.zeros((1e6,), dtype=np.float64)
    
    In [2]: b = np.zeros((1e6,), dtype=np.float32)
    
    In [3]: %timeit a*a
    1000 loops, best of 3: 1.78 ms per loop
    
    In [4]: %timeit b*b
    1000 loops, best of 3: 1.07 ms per loop
    
  • But: bigger rounding errors — sometimes in surprising places (i.e., don’t use them unless you really need them)

1.3.4.2. Structured data types

Composite data types
sensor_code (4-character string) 
position (float) 
value (float) 
>>> samples = np.zeros((6,), dtype=[('sensor_code', 'S4'),
...                                 ('position', float), ('value', float)])
>>> samples.ndim
1
>>> samples.shape
(6,)
>>> samples.dtype.names
('sensor_code', 'position', 'value')
>>> samples[:] = [('ALFA', 1, 0.35), ('BETA', 1, 0.11), ('TAU', 1, 0.39),
...               ('ALFA', 1.5, 0.35), ('ALFA', 2.1, 0.11), ('TAU', 1.2, 0.39)]
>>> samples
array([('ALFA', 1.0, 0.35), ('BETA', 1.0, 0.11), ('TAU', 1.0, 0.39),
       ('ALFA', 1.5, 0.35), ('ALFA', 2.1, 0.11), ('TAU', 1.2, 0.39)],
      dtype=[('sensor_code', '|S4'), ('position', '<f8'), ('value', '<f8')])
Field access works by indexing with field names:
>>> samples['sensor_code']
array(['ALFA', 'BETA', 'TAU', 'ALFA', 'ALFA', 'TAU'],
      dtype='|S4')
>>> samples['value']
array([ 0.35,  0.11,  0.39,  0.35,  0.11,  0.39])
>>> samples[0]
('ALFA', 1.0, 0.35)
>>> samples[0]['sensor_code'] = 'TAU'
>>> samples[0]
('TAU', 1.0, 0.35)
Multiple fields at once:
>>> samples[['position', 'value']]
array([(1.0, 0.35), (1.0, 0.11), (1.0, 0.39), (1.5, 0.35), (2.1, 0.11),
       (1.2, 0.39)],
      dtype=[('position', '<f8'), ('value', '<f8')])
Fancy indexing works, as usually:
>>> samples[samples['sensor_code'] == 'ALFA']
array([('ALFA', 1.0, 0.35), ('ALFA', 1.5, 0.35), ('ALFA', 2.1, 0.11)],
      dtype=[('sensor_code', '|S4'), ('position', '<f8'), ('value', '<f8')])
Note
 
There are a bunch of other syntaxes for constructing structured arrays, see here and here.

1.3.4.3. Fourier transforms

Numpy contains 1-D, 2-D, and N-D fast discrete Fourier transform routines, which compute:
A_k =  \sum_{m=0}^{n-1} a_m \exp\left\{-2\pi i{mk \over n}\right\}
\qquad k = 0,\ldots,n-1.
Full details of what for you can use such standard routines is beyond this tutorial. Neverheless, there they are, if you need them:
>>> a = np.exp(2j*np.pi*np.arange(10))
>>> fa = np.fft.fft(a)
>>> np.set_printoptions(suppress=True) # print small number as 0
>>> fa
array([ 10.-0.j,   0.+0.j,   0.+0.j,   0.+0.j,   0.+0.j,   0.+0.j,
        -0.+0.j,  -0.+0.j,  -0.+0.j,  -0.+0.j])
>>> a = np.exp(2j*np.pi*np.arange(3))
>>> b = a[:,np.newaxis] + a[np.newaxis,:]
>>> np.fft.fftn(b)
array([[ 18.-0.j,   0.+0.j,  -0.+0.j],
       [  0.+0.j,   0.+0.j,   0.+0.j],
       [ -0.+0.j,   0.+0.j,   0.+0.j]])
See help(np.fft) and help(np.fft.fft) for more. These functions in general take the axes argument, and you can additionally specify padding etc.
Worked example: Crude periodicity finding
"""
Discover the periods in ../../../data/populations.txt
"""
import numpy as np
import matplotlib.pyplot as plt

data = np.loadtxt('../../../data/populations.txt')
years = data[:,0]
populations = data[:,1:]

ft_populations = np.fft.fft(populations, axis=0)
frequencies = np.fft.fftfreq(populations.shape[0], years[1] - years[0])

plt.figure()
plt.plot(years, populations)
plt.figure()
plt.plot(1/frequencies, abs(ft_populations), 'o')
plt.xlim(0, 20)
plt.show()
../../_images/4_a_periodicity_00_00.png
../../_images/4_a_periodicity_00_01.png
# There's probably a period of around 10 years (obvious from the
# plot), but for this crude a method, there's not enough data to say
# much more.
Worked example: Gaussian image blur
Convolution:
f_1(t) = \int dt'\, K(t-t') f_0(t')
\tilde{f}_1(\omega) = \tilde{K}(\omega) \tilde{f}_0(\omega)
"""
Simple image blur by convolution with a Gaussian kernel
"""

import numpy as np
from numpy import newaxis
import matplotlib.pyplot as plt

# read image
img = plt.imread('../../../data/elephant.png')

# prepare an 1-D Gaussian convolution kernel
t = np.linspace(-10, 10, 30)
bump = np.exp(-0.1*t**2)
bump /= np.trapz(bump) # normalize the integral to 1

# make a 2-D kernel out of it
kernel = bump[:,newaxis] * bump[newaxis,:]

# padded fourier transform, with the same shape as the image
kernel_ft = np.fft.fft2(kernel, s=img.shape[:2], axes=(0, 1))

# convolve
img_ft = np.fft.fft2(img, axes=(0, 1))
img2_ft = kernel_ft[:,:,newaxis] * img_ft
img2 = np.fft.ifft2(img2_ft, axes=(0, 1)).real

# clip values to range
img2 = np.clip(img2, 0, 1)

# plot output
plt.imshow(img2)
plt.show()
../../_images/4_b_image_blur_00_00.png
# Further exercise (only if you are familiar with this stuff):
#
# A "wrapped border" appears in the upper left and top edges of the
# image. This is because the padding is not done correctly, and does
# not take the kernel size into account (so the convolution "flows out
# of bounds of the image").  Try to remove this artifact.

1.3.4.4. Masked arrays

Masked arrays are arrays that may have missing or invalid entries.
For example, suppose we have an array where the fourth entry is invalid:
>>> x = np.array([1, 2, 3, -99, 5])
One way to describe this is to create a masked array:
>>> mx = ma.masked_array(x, mask=[0, 0, 0, 1, 0])
>>> mx
masked_array(data = [1 2 3 -- 5],
             mask = [False False False  True False],
       fill_value = 999999)
Masked mean ignores masked data:
>>> mx.mean()
2.75
>>> np.mean(mx)
2.75
Warning
 
Not all Numpy functions respect masks, for instance np.dot, so check the return types.
The masked_array returns a view to the original array:
>>> mx[1] = 9
>>> x
array([  1,   9,   3, -99,   5])
The mask
You can modify the mask by assigning:
>>> mx[1] = np.ma.masked
>>> mx
masked_array(data = [1 -- 3 -- 5],
            mask = [False  True False  True False],
    fill_value = 999999)
The mask is cleared on assignment:
>>> mx[1] = 9
>>> mx
masked_array(data = [1 9 3 -- 5],
            mask = [False False False  True False],
    fill_value = 999999)
The mask is also available directly:
>>> mx.mask
array([False, False, False,  True, False], dtype=bool)
The masked entries can be filled with a given value to get an usual array back:
>>> x2 = mx.filled(-1)
>>> x2
array([ 1,  9,  3, -1,  5])
The mask can also be cleared:
>>> mx.mask = np.ma.nomask
>>> mx
masked_array(data = [1 9 3 -99 5],
            mask = [False False False False False],
    fill_value = 999999)
Domain-aware functions
The masked array package also contains domain-aware functions:
>>> np.ma.log(np.array([1, 2, -1, -2, 3, -5]))
masked_array(data = [0.0 0.69314718056 -- -- 1.09861228867 --],
            mask = [False False  True  True False  True],
    fill_value = 1e+20)
Note
 
Streamlined and more seamless support for dealing with missing data in arrays is making its way into Numpy 1.7. Stay tuned!
Example: Masked statistics
Canadian rangers were distracted when counting hares and lynxes in 1903-1910 and 1917-1918, and got the numbers are wrong. (Carrot farmers stayed alert, though.) Compute the mean populations over time, ignoring the invalid numbers.
>>> data = np.loadtxt('../../data/populations.txt')
>>> populations = np.ma.masked_array(data[:,1:])
>>> year = data[:,0]
>>> bad_years = (((year >= 1903) & (year <= 1910))
...            | ((year >= 1917) & (year <= 1918)))
>>> populations[bad_years,0] = np.ma.masked
>>> populations[bad_years,1] = np.ma.masked
>>> populations.mean(axis=0)
masked_array(data = [40472.7272727 18627.2727273 42400.0],
             mask = [False False False],
      fill_value = 1e+20)
>>> populations.std(axis=0)
masked_array(data = [21087.656489 15625.7998142 3322.50622558],
             mask = [False False False],
       fill_value = 1e+20)
Note that Matplotlib knows about masked arrays:
>>> plt.plot(year, populations, 'o-')
>>> plt.show()
../../_images/numpy-8.png

1.3.4.5. Polynomials

Numpy also contains polynomials in different bases:
For example, 3x^2 + 2x - 1
>>> p = np.poly1d([3, 2, -1])
>>> p(0)
-1
>>> p.roots
array([-1.        ,  0.33333333])
>>> p.order
2
>>> x = np.linspace(0, 1, 20)
>>> y = np.cos(x) + 0.3*np.random.rand(20)
>>> p = np.poly1d(np.polyfit(x, y, 3))
>>> t = np.linspace(0, 1, 200)
>>> plt.plot(x, y, 'o', t, p(t), '-')
>>> plt.show()
../../_images/numpy-9.png
More polynomials (with more bases)
Numpy also has a more sophisticated polynomial interface, which supports e.g. the Chebyshev basis.
3x^2 + 2x - 1
>>> p = np.polynomial.Polynomial([-1, 2, 3]) # coefs in different order!
>>> p(0)
-1.0
>>> p.roots()
array([-1.        ,  0.33333333])
>>> p.order
2
Example using polynomials in Chebyshev basis, for polynomials in range [-1, 1]:
>>> x = np.linspace(-1, 1, 2000)
>>> y = np.cos(x) + 0.3*np.random.rand(2000)
>>> p = np.polynomial.Chebyshev.fit(x, y, 90)
>>> t = np.linspace(-1, 1, 200)
>>> plt.plot(x, y, 'r.')
>>> plt.plot(t, p(t), 'k-', lw=3)
>>> plt.show()
../../_images/numpy-10.png
The Chebyshev polynomials have some advantages in interpolation.

1.3.4.6. Summary & Exercises

  • There is a number of data types with different precisions. In some special cases you may need to care about this.
  • Structured arrays contain data of a composite type. Lumping pieces of data together this way has various possible uses.
  • Fourier transform routines are under np.fft
  • Masked arrays can be used for missing data
  • Polynomials are available in various bases

1.3.5. 4. Under the hood

1.3.5.1. It’s...

ndarray =
block of memory + indexing scheme + data type descriptor
  • raw data
  • how to locate an element
  • how to interpret an element
../../_images/threefundamental1.png

1.3.5.2. Block of memory

>>> x = np.array([1, 2, 3, 4], dtype=np.int32)
>>> x.data
<read-write buffer for 0xa37bfd8, size 16, offset 0 at 0xa4eabe0>
>>> str(x.data)
'\x01\x00\x00\x00\x02\x00\x00\x00\x03\x00\x00\x00\x04\x00\x00\x00'
Memory address of the data:
>>> x.__array_interface__['data'][0]
159755776
Reminder: two ndarrays may share the same memory:
>>> x = np.array([1,2,3,4])
>>> y = x[:]
>>> x[0] = 9
>>> y
array([9, 2, 3, 4])
>>> y.base is x
True
Memory does not need to be owned by an ndarray:
>>> x = '\x01\x02\x03\x04'
>>> y = np.frombuffer(x, dtype=np.int8)
>>> y
array([1, 2, 3, 4], dtype=int8)
>>> y.data
<read-only buffer for 0xa588ba8, size 4, offset 0 at 0xa55cd60>
>>> y.base is x
True
>>> y.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : False
  ALIGNED : True
  UPDATEIFCOPY : False
The owndata and writeable flags indicate status of the memory block.

1.3.5.3. Indexing scheme: strides

The question
>>> x = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]], dtype=np.int8)
>>> str(x.data)
'\x01\x02\x03\x04\x05\x06\x07\x08\x09'
At which byte in x.data does the item x[1,2] begin?
The answer (in Numpy)
  • strides: the number of bytes to jump to find the next element
  • 1 stride per dimension
>>> x.strides
(3, 1)
>>> byte_offset = 3*1 + 1*2   # to find x[1,2]
>>> x.data[byte_offset]
'\x06'
>>> x[1,2]
6
  • simple, flexible
C and Fortran order
>>> x = np.array([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]], dtype=np.int16, order='C')
>>> x.strides
(6, 2)
>>> str(x.data)
'\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00\x06\x00\x07\x00\x08\x00\t\x00'
  • Need to jump 6 bytes to find the next row
  • Need to jump 2 bytes to find the next column
>>> y = np.array(x, order='F')
>>> y.strides
(2, 6)
>>> str(y.data)
'\x01\x00\x04\x00\x07\x00\x02\x00\x05\x00\x08\x00\x03\x00\x06\x00\t\x00'
  • Need to jump 2 bytes to find the next row
  • Need to jump 6 bytes to find the next column
  • Similarly to higher dimensions:
    • C: last dimensions vary fastest (= smaller strides)
    • F: first dimensions vary fastest
    \mathrm{shape} &= (d_1, d_2, ..., d_n)
\\
\mathrm{strides} &= (s_1, s_2, ..., s_n)
\\
s_j^C &= d_{j+1} d_{j+2} ... d_{n} \times \mathrm{itemsize}
\\
s_j^F &= d_{1} d_{2} ... d_{j-1} \times \mathrm{itemsize}
Slicing
  • Everything can be represented by changing only shapestrides, and possibly adjusting the data pointer!
  • Never makes copies of the data
>>> x = np.array([1, 2, 3, 4, 5, 6], dtype=np.int32)
>>> y = x[::-1]
>>> y
array([6, 5, 4, 3, 2, 1])
>>> y.strides
(-4,)
>>> y = x[2:]
>>> y.__array_interface__['data'][0] - x.__array_interface__['data'][0]
8
>>> x = np.zeros((10, 10, 10), dtype=np.float)
>>> x.strides
(800, 80, 8)
>>> x[::2,::3,::4].strides
(1600, 240, 32)
  • Similarly, transposes never make copies (it just swaps strides)
>>> x = np.zeros((10, 10, 10), dtype=np.float)
>>> x.strides
(800, 80, 8)
>>> x.T.strides
(8, 80, 800)
Reshaping
But: not all reshaping operations can be represented by playing with strides.
>>> a = np.arange(6, dtype=np.int8).reshape(3, 2)
>>> b = a.T
>>> b.strides
(1, 2)
So far, so good. However:
>>> str(a.data)
'\x00\x01\x02\x03\x04\x05'
>>> b
array([[0, 2, 4],
       [1, 3, 5]], dtype=int8)
>>> c = b.reshape(3*2)
>>> c
array([0, 2, 4, 1, 3, 5], dtype=int8)
Here, there is no way to represent the array c given one stride and the block of memory for a. Therefore, the reshape operation needs to make a copy here.

1.3.5.4. Summary

  • Numpy array: block of memory + indexing scheme + data type description
  • Indexing: strides
    byte_position = np.sum(arr.strides * indices)
  • Various tricks can you do by playing with the strides (stuff for an advanced tutorial it is)