Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up open().read() pattern by reducing the number of system calls #120754

Open
cmaloney opened this issue Jun 19, 2024 · 5 comments · Fixed by #120755
Open

Speed up open().read() pattern by reducing the number of system calls #120754

cmaloney opened this issue Jun 19, 2024 · 5 comments · Fixed by #120755
Labels
performance Performance or resource usage type-feature A feature request or enhancement

Comments

@cmaloney
Copy link
Contributor

cmaloney commented Jun 19, 2024

Feature or enhancement

Proposal:

I came across some seemingly redundant fstat() and lseek() calls when working on a tool that scanned a directory of lots of small YAML files and loaded their contents as config. In tracing I found most execution time wasn't in the python interpreter but system calls (on top of NFS in that case, which made some I/O calls particularly slow).

I've been experimenting with a program that reads all .rst files in the python Docs directory to try and remove some of those redundant system calls..

Test Program

from pathlib import Path

nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
    nlines.append(len(filename.read_text()))

In my experimentation, with some tweaks to fileio can remove over 10% of the system calls the test program makes when scanning the whole Doc folders for .rst files on both macOS and Linux (don't have a Windows machine to measure on).

Current State (9 system calls)

Currently on my Linux machine to read a whole .rst file with the above code there is this series of system calls:

openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0

Target State (7 5 system calls)

It would be nice to get it down to (for small files, large file caveat in PR / get an additional seek):

# Open the file
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
# Check if the open fd is a file or directory and early-exit on directories with a specialized error.
# With my changes we also stash the size information from this for later use as an estimate.
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
# Read the data directly into a PyBytes
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
# Read the EOF marker
read(3, "", 1)                          = 0
# Close the file
close(3)                                = 0

In a number of cases (ex. importing modules) there is often a fstat followed immediately by an open / read the file (which does another fstat typically), but that is an extension point and I want to keep that out of scope for now.

Questions rattling around in my head around this

Some of these are likely better for Discourse / longer form discussion, happy to start threads there as appropriate.

  1. Is there a way to add a test for certain system calls happening with certain arguments and/or a certain amount of time? (I don't currently see a great way to write a test to make sure the number of system calls doesn't change unintentionally)
  2. Running a simple python script (python simple.py that contains print("Hello, World!")) currently reads simple.py in full at least 4 times and does over 5 seeks. I have been pulling on that thread but it interacts with importlib as well as how the python compiler currently works, still trying to get my head around. Would removing more of those overheads be something of interest / should I keep working to get my head around it?
  3. We could potentially save more
    1. with readv (one readv call, two iovecs). I avoided this for now because _Py_read does quite a bit.
    2. dispatching multiple calls in parallel using asynchronous I/O APIs to meet the python API guarantees; I am experimenting with this (backed by relatively new Linux I/O APIs but possibly for kqueue and epoll), but it's very experimental and feeling a lot like "has to be a interpreter primitive" to me to work effectively which is complex to plumb through. Very early days though, many thoughts, not much prototype code.
  4. The _blksize member of fileio was added in bpo-21679. It is not used much as far as I can tell as its reflection _blksize in python or in the code. The only usage I can find is https://github.com/python/cpython/blob/main/Modules/_io/_iomodule.c#L365-L374, where we could just query for it when needed in that case to save some storage on all fileio objects. The behavior of using the stat returned st_blksize is part of the docs, so doesn't feel like we can fully remove it.

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

@cmaloney cmaloney added the type-feature A feature request or enhancement label Jun 19, 2024
@cmaloney cmaloney changed the title Speed up open().readall() pattern by reducing the number of system calls Jun 19, 2024
@mdboom mdboom added the performance Performance or resource usage label Jun 20, 2024
@mdboom
Copy link
Contributor

mdboom commented Jun 20, 2024

Running a simple python script (python simple.py that contains print("Hello, World!")) currently reads simple.py in full at least 4 times and does over 5 seeks. I have been pulling on that thread but it interacts with importlib as well as how the python compiler currently works, still trying to get my head around. Would removing more of those overheads be something of interest / should I keep working to get my head around it?

Startup time is always an important thing to optimize for.

@cmaloney
Copy link
Contributor Author

I found the source of the remaining lseek(3, 0, SEEK_CUR) by using lldb to set a break point and get a backtrace. In _io.open() (_io_open_impl), by default, buffering is on, and that results in a BufferedIO* object being constructed. All of those call _buffered_init which calls _buffered_raw_tell to figure out the position so the buffered IO can track positions itself.

In the read whole file case that seems to be a lot of unnecessary work (Allocate a additional buffer and lock, do the seek, a list of "chunks" for read that get joined w/ special fast cases for "only one chunk", etc) which gets discarded shortly after. While it is likely possible to rewrite read_bytes() and read_text() to do less work, open().read() is common enough in code I'm planning to try and make that do less work overall, trying to avoid special-casing the operations.

@cmaloney
Copy link
Contributor Author

cmaloney commented Jun 27, 2024

Summary

# main
read_file_large: Mean +- std dev: 24.0 us +- 0.4 us

# readall_faster, removes a stat (#120755): 
read_file_large: Mean +- std dev: 21.2 us +- 0.6 us

# No isatty call (no pr yet)
read_file_large: Mean +- std dev: 20.2 us +- 0.3 us

No patch that removes the BufferedIO/TextIO seek passes tests yet. Planning to see what happens if stash whole stat result rather than lots of individual members.

Details

It touches a lot of code to remove the remaining seek. BufferedIO has implementation for the "absolute position" abs_pos to be optional / lazily filled. Once that is removed however, TextIO checks for "is this seekable" on construction, which is a boolean member that must be filled at construction currently. Still trying for a well contained way to do that.

The isatty call can in theory have a shortcut added around it using the fstat st_mode information. Just adding a "if this is a regular file" check before it + member accessible from fileio I get a speedup on my Mac :

main

python ../benchmark.py
.....................
read_file_small: Mean +- std dev: 8.43 us +- 0.12 us
.....................
read_file_large: Mean +- std dev: 24.0 us +- 0.4 us

cmaloney/readall_faster (#120755)

.....................
read_file_small: Mean +- std dev: 7.92 us +- 0.07 us
.....................
read_file_large: Mean +- std dev: 21.2 us +- 0.6 us

No isatty (local only currently):

.....................
read_file_small: Mean +- std dev: 7.32 us +- 0.09 us
.....................
read_file_large: Mean +- std dev: 20.2 us +- 0.3 us

That reduces the time for reading a single large .rst file in the cpython repo ~15% so far.

I built a little larger benchmark to more match code I ran into this with: "read all the .rst and .py in the cpython repo"

import pyperf
from pathlib import Path


def read_all(all_paths):
    for p in all_paths:
        p.read_bytes()


def read_file(path_obj):
    path_obj.read_text()


all_rst = list(Path("Doc").glob("**/*.rst"))
all_py = list(Path(".").glob("**/*.py"))
assert all_rst, "Should have found .rst files"
assert all_py, "Should have found .py source files"


runner = pyperf.Runner()
runner.bench_func("read_file_small", read_file, Path("Doc/howto/clinic.rst"))
runner.bench_func("read_file_large", read_file, Path("Doc/c-api/typeobj.rst"))

runner.bench_func("read_all_rst", read_all, all_rst)
runner.bench_func("read_all_py", read_all, all_py)

And on my Linux box read_all_py is ~9% faster with both changes at the moment.

I'm iterating on top of my existing PR code. Both removing the extra fstat and isatty require getting and stashing more data out of the initial stat result, so planning to see what the code looks like to just stash the whole stat result (although it is fairly big, so possibly with a hatch to unset it on write/seek or manually). This would also be a way to remove the _blksize member which is also only used at open() time (to figure out buffer size to pass to BufferedIO*).

vstinner added a commit that referenced this issue Jul 4, 2024
…20755)

This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)

This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.

The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.

I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].

After the changes, this is everything in one `filename.read_text()`:

```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().

[0]

```python3
from pathlib import Path

nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
    nlines.append(len(filename.read_text()))
```

[1]
Before small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

After small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

Before large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

After large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
@cmaloney
Copy link
Contributor Author

cmaloney commented Jul 4, 2024

Could this be reopened? My plan is to keep working on trying to simplify what happens in this case, would like to attach to this issue (strace pr followup test, removing seeks, possibly removing isatty)

@vstinner vstinner reopened this Jul 4, 2024
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 4, 2024
Sometimes a large file is truncated (test_largefile). While
estimated_size is used as a estimate (the read will stil get the number
of bytes in the file), that it is much larger than the actual size of
data can result in a significant over allocation and sometimes lead to
a MemoryError / running out of memory.

This brings the C implementation to match the Python _pyio
implementation.
vstinner pushed a commit that referenced this issue Jul 4, 2024
Sometimes a large file is truncated (test_largefile). While
estimated_size is used as a estimate (the read will stil get the number
of bytes in the file), that it is much larger than the actual size of
data can result in a significant over allocation and sometimes lead to
a MemoryError / running out of memory.

This brings the C implementation to match the Python _pyio
implementation.
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a tty.

The `open()` API requires a `stat` call at present in order to make sure
the file being opened isn't a directory. That result includes the file
mode which tells us if it is a regular file. There's a number of
attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a TTY.

The `open()` Python builtin requires a `stat` call at present in order
to ensure the file being opened isn't a directory. That result includes
the file mode which tells us if it is a regular file. There are a number
of attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 10, 2024
For POSIX, TTYs are never regular files, so if the interpreter knows the
file is regular it doesn't need to do an additional system call to check
if the file is a TTY.

The `open()` Python builtin requires a `stat` call at present in order
to ensure the file being opened isn't a directory. That result includes
the file mode which tells us if it is a regular file. There are a number
of attributes from the stat which are stashed one off currently, move to
stashing the whole object rather than just individual members.

The stat object is reasonably large and currently the
`stat_result.st_size` member cannot be modified from Python, which is
needed by the `_pyio` implementation, so make the whole stat object
optional. In the `_io` implementation this makes handling a stat
failure simpler. At present there is no explicit user call to clear it,
but if one is needed (ex. a program which has a lot of open FileIO
objects and the memory becomes a problem) it would be straightforward to
add. Ideally would be able to automatically clear (the values are
generally used during I/O object initialization and not after. After a
`write` they are no longer useful in current cases).

It is fairly common pattern to scan a directory, look at the `stat`
results (ex. is this file changed), and then open/read the file. In this
PR I didn't update open's API to allow passing in a stat result to use,
but that could be beneficial for some cases (ex. `importlib`).

With this change on my Linux machine reading a small plain text file is
down to 6 system calls.

```python
openat(AT_FDCWD, "read_one.py", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=87, ...}) = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, "from pathlib import Path\n\npath ="..., 88) = 87
read(3, "", 1)                          = 0
close(3)                                = 0
```
@cmaloney
Copy link
Contributor Author

See also: gh-90102

noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…se (python#120755)

This reduces the system call count of a simple program[0] that reads all
the `.rst` files in Doc by over 10% (5706 -> 4734 system calls on my
linux system, 5813 -> 4875 on my macOS)

This reduces the number of `fstat()` calls always and seek calls most
the time. Stat was always called twice, once at open (to error early on
directories), and a second time to get the size of the file to be able
to read the whole file in one read. Now the size is cached with the
first call.

The code keeps an optimization that if the user had previously read a
lot of data, the current position is subtracted from the number of bytes
to read. That is somewhat expensive so only do it on larger files,
otherwise just try and read the extra bytes and resize the PyBytes as
needeed.

I built a little test program to validate the behavior + assumptions
around relative costs and then ran it under `strace` to get a log of the
system calls. Full samples below[1].

After the changes, this is everything in one `filename.read_text()`:

```python3
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3`
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0`
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

This does make some tradeoffs
1. If the file size changes between open() and readall(), this will
still get all the data but might have more read calls.
2. I experimented with avoiding the stat + cached result for small files
in general, but on my dev workstation at least that tended to reduce
performance compared to using the fstat().

[0]

```python3
from pathlib import Path

nlines = []
for filename in Path("cpython/Doc").glob("**/*.rst"):
    nlines.append(len(filename.read_text()))
```

[1]
Before small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

After small file:

```
openat(AT_FDCWD, "cpython/Doc/howto/clinic.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=343, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ":orphan:\n\n.. This page is retain"..., 344) = 343
read(3, "", 1)                          = 0
close(3)                                = 0
```

Before large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffe52525930)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

After large file:

```
openat(AT_FDCWD, "cpython/Doc/c-api/typeobj.rst", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=133104, ...}) = 0
ioctl(3, TCGETS, 0x7ffdfac04b40)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
read(3, ".. highlight:: c\n\n.. _type-struc"..., 133105) = 133104
read(3, "", 1)                          = 0
close(3)                                = 0
```

Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Victor Stinner <vstinner@python.org>
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
Sometimes a large file is truncated (test_largefile). While
estimated_size is used as a estimate (the read will stil get the number
of bytes in the file), that it is much larger than the actual size of
data can result in a significant over allocation and sometimes lead to
a MemoryError / running out of memory.

This brings the C implementation to match the Python _pyio
implementation.
cmaloney added a commit to cmaloney/cpython that referenced this issue Jul 11, 2024
Currently if code tries to do a os.read larger than the max bytes object
length, the size to read gets capped to `_PY_READ_MAX`, then the code
tries to allocate a PyBytes which fails with an OverflowError as the
size is larger than the max py bytes object.

Since os.read is capping the max size anyways, cap it to a size which is
always allocatable as a PyBytes.

This changes behavior from bpo-21932 and enables the large file os.read
test on 32 bit platforms, as it should cap the read to a platform
acceptable size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage type-feature A feature request or enhancement
3 participants