#149 Python's small object allocator and other memory features
Published September 25, 2019
37 min
    Add to queue
    Copy URL
    Show notes

    Sponsored by Datadog: pythonbytes.fm/datadog

    Brian #1: Dropbox: Our journey to type checking 4 million lines of Python

    • Continuing saga, but this is a cool write up.
    • Benefits
      • “Experience tells us that understanding code becomes the key to maintaining developer productivity. Without type annotations, basic reasoning such as figuring out the valid arguments to a function, or the possible return value types, becomes a hard problem. Here are typical questions that are often tricky to answer without type annotations:
        • Can this function return None?
        • What is this items argument supposed to be?
        • What is the type of the id attribute: is it int, str, or perhaps some custom type?
        • Does this argument need to be a list, or can I give a tuple or a set?”
      • Type checker will find many subtle bugs.
      • Refactoring is easier.
      • Running type checking is faster than running large suites of unit tests, so feedback can be faster.
      • Typing helps IDEs with better completion, static error checking, and more.
    • Long story, but really cool learnings of how and why to tackle adding type hints to a large project with many developers.
    • Conclusion. mypy is great now, because DropBox needed it to be.

    Michael #2: Setting Up a Flask Application in Visual Studio Code

    • Video, but also as a post
    • Follow on to the same in PyCharm:
    • Steps outside VS Code
      • Clone repo
      • Create a virtual env (via venv)
      • Install requirements (via requirements.txt)
      • Setup flask app ENV variable
      • flask deploy ← custom command for DB
    • VS Code
      • Open the folder where the repo and venv live
      • Open any Python file to trigger the Python subsystem
      • Ensure the correct VENV is selected (bottom left)
      • Open the debugger tab, add config, pick Flask, choose your app.py file
      • Debug menu, start without debugging (or with)
    • Adding tests via VS Code
      • Open command pallet (CMD SHIFT P), Python: Discover Tests, select framework, select directory of tests, file pattern, new tests bottle on the right bar

    Brian #3: Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know

    • How data scientists can go about choosing between the multiprocessing and threading and which factors should be kept in mind while doing so.
    • Does not consider async, but still some great info.
    • Overview of both concepts in general and some of the pitfalls of parallel computing.
    • The specifics in Python, with the GIL
    • Use threads for waiting on IO or waiting on users.
    • Use multiprocessing for CPU intensive work.
    • The surprising bit for me was the benchmarks
      • Using something speeds up the code. That’s obvious.
      • The difference between the two isn’t as great as I would have expected.
    • A discussion of merits and benefits of both.
    • And from the perspective of data science.
    • A few more examples, with code, included.

    Michael #4: ORM - async ORM

    • And https://github.com/encode/databases
    • The orm package is an async ORM for Python, with support for Postgres, MySQL, and SQLite.
    • SQLAlchemy core for query building.
    • databases for cross-database async support.
    • typesystem for data validation.
    • Because ORM is built on SQLAlchemy core, you can use Alembic to provide database migrations.
    • Need to be pretty async savy

    Brian #5: Getting Started with APIs

    • dataquest.io post
    • Conceptual introduction of web APIs
    • Discussion of GET status codes, including a nice list with descriptions.
      • examples:
        • 301: The server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
        • 400: The server thinks you made a bad request. This can happen when you don’t send along the right data, among other things.
    • endpoints
    • endpoints that take query parameters
    • JSON data
    • Examples in Python for using:
      • requests to query endpoints.
      • json to load and dump JSON data.

    Michael #6: Memory management in Python

    • This article describes memory management in Python 3.6.
    • Everything in Python is an object. Some objects can hold other objects, such as lists, tuples, dicts, classes, etc.
    • such an approach requires a lot of small memory allocations
    • To speed-up memory operations and reduce fragmentation Python uses a special manager on top of the general-purpose allocator, called PyMalloc.
    • Layered managers
      • RAM
      • OS VMM
      • C-malloc
      • PyMem
      • Python Object allocator
      • Object memory
    • Three levels of organization
      • To reduce overhead for small objects (less than 512 bytes) Python sub-allocates big blocks of memory.
      • Larger objects are routed to standard C allocator.
      • three levels of abstraction — arena, pool, and block.
      • Block is a chunk of memory of a certain size. Each block can keep only one Python object of a fixed size. The size of the block can vary from 8 to 512 bytes and must be a multiple of eight
      • A collection of blocks of the same size is called a pool. Normally, the size of the pool is equal to the size of a memory page, i.e., 4Kb.
      • The arena is a chunk of 256kB memory allocated on the heap, which provides memory for 64 pools.
    • Python's small object manager rarely returns memory back to the Operating System.
    • An arena gets fully released If and only if all the pools in it are empty.



    • Tuesday, Oct 6, Python PDX West,
    • Thursday, Sept 26, I’ll be speaking at PDX Python, downtown.
    • Both events, mostly, I’ll be working on new programming jokes unless I come up with something better. :)


    Jokes: A few I liked from the dad joke list.

    • What do you call a 3.14 foot long snake? A π-thon
    • What if it’s 3.14 inches, instead of feet? A μ-π-thon
    • Why doesn't Hollywood make more Big Data movies? NoSQL.
    • Why didn't the div get invited to the dinner party? Because it had no class.
        0:00:00 / 0:00:00