Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs and docstrings for io.*.seek are inconsistent #107801

Open
MatthiasWiesmann opened this issue Aug 9, 2023 · 15 comments
Open

Docs and docstrings for io.*.seek are inconsistent #107801

MatthiasWiesmann opened this issue Aug 9, 2023 · 15 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes docs Documentation in the Doc dir pending The issue will be closed if no feedback is provided topic-IO type-bug An unexpected behavior, bug, or error

Comments

@MatthiasWiesmann
Copy link

MatthiasWiesmann commented Aug 9, 2023

Documentation

The documentation on text handles is misleading, in particular for the seek method.

Steps to reproduce

handle = open('/tmp/lines.txt')
help(handle.seek)

This produces the following documentation:

Help on built-in function seek:

seek(cookie, whence=0, /) method of _io.TextIOWrapper instance
    Change stream position.
    
    Change the stream position to the given byte offset. The offset is
    interpreted relative to the position indicated by whence.  Values
    for whence are:
    
    * 0 -- start of stream (the default); offset should be zero or positive
    * 1 -- current stream position; offset may be negative
    * 2 -- end of stream; offset is usually negative
    
    Return the new absolute position.

Issues

  • The documentation talks about a byte offset, but the interface mentions a cookie.
  • The constants defined in os module, like os.SEEK_END are not mentioned.
  • whence=2 with negative offset fails.

Generally, the behaviour is very inconsistent, seek relative to the end fail, seek relative to the start works, but might yield a situation when read fails.

Full investigation of the issue

Linked PRs

@MatthiasWiesmann MatthiasWiesmann added the docs Documentation in the Doc dir label Aug 9, 2023
@sunmy2019
Copy link
Member

Note: If anything is changed here, the relevant doc should also be changed.
https://docs.python.org/3/library/io.html#io.IOBase.seek

@AA-Turner AA-Turner added type-bug An unexpected behavior, bug, or error 3.11 only security fixes topic-IO 3.12 bugs and security fixes 3.13 bugs and security fixes labels Aug 12, 2023
@AA-Turner
Copy link
Member

@MatthiasWiesmann would you have time to write a PR with suggestied revisions of the help text?

To note, Python 3.12 will include the constants in os (see #104418), though there is quite a bit of inconsistency between the various seek() methods' docstrings:

  • /*[clinic input]
    _io._IOBase.seek
    cls: defining_class
    offset: int(unused=True)
    whence: int(unused=True, c_default='0') = os.SEEK_SET
    /
    Change the stream position to the given byte offset.
    The offset is interpreted relative to the position indicated by whence.
    Values for whence are:
    * os.SEEK_SET or 0 -- start of stream (the default); offset should be zero or positive
    * os.SEEK_CUR or 1 -- current stream position; offset may be negative
    * os.SEEK_END or 2 -- end of stream; offset is usually negative
    Return the new absolute position.
    [clinic start generated code]*/
  • /*[clinic input]
    _io._Buffered.seek
    target as targetobj: object
    whence: int = 0
    /
    [clinic start generated code]*/
  • /*[clinic input]
    _io.BytesIO.seek
    pos: Py_ssize_t
    whence: int = 0
    /
    Change stream position.
    Seek to byte offset pos relative to position indicated by whence:
    0 Start of stream (the default). pos should be >= 0;
    1 Current position - pos may be negative;
    2 End of stream - pos usually negative.
    Returns the new absolute position.
    [clinic start generated code]*/
  • /*[clinic input]
    _io.FileIO.seek
    pos: object
    whence: int = 0
    /
    Move to new file position and return the file position.
    Argument offset is a byte count. Optional argument whence defaults to
    SEEK_SET or 0 (offset from start of file, offset should be >= 0); other values
    are SEEK_CUR or 1 (move relative to current position, positive or negative),
    and SEEK_END or 2 (move relative to end of file, usually negative, although
    many platforms allow seeking beyond the end of a file).
    Note that not all file objects are seekable.
    [clinic start generated code]*/
  • /*[clinic input]
    _io.StringIO.seek
    pos: Py_ssize_t
    whence: int = 0
    /
    Change stream position.
    Seek to character offset pos relative to position indicated by whence:
    0 Start of stream (the default). pos should be >= 0;
    1 Current position - pos must be 0;
    2 End of stream - pos must be 0.
    Returns the new absolute position.
    [clinic start generated code]*/
  • cpython/Modules/_io/textio.c

    Lines 2428 to 2433 in 2e27da1

    /*[clinic input]
    _io.TextIOWrapper.seek
    cookie as cookieObj: object
    whence: int = 0
    /
    [clinic start generated code]*/

cc @erlend-aasland as the author of #104418 if you have any thoughts.

A

@erlend-aasland erlend-aasland self-assigned this Aug 12, 2023
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 12, 2023
- Name positional parameters consistently: seek(offset, whence, /)
- Add param docstrings to _io._IOBase.seek, so the various
  implementations interit them.
- Use io.SEEK_*, not os.SEEK_*
- Override docstrings for subclasses where 'offset' must be zero for
  SEEK_CUR and SEEK_END.
@erlend-aasland
Copy link
Contributor

I agree that it is confusing that the parameter names for the low-level API seek() aren't used consistently throughout the docs/docstrings. Luckily, these are all positional-only, so we can easily normalise them without backwards compatibility concerns.

Suggested improvements:

  • Consistently name the seek() params
  • Consistently use io.SEEK_* constants instead of os.SEEK_*
  • Document io.SEEK_ constants
@serhiy-storchaka
Copy link
Member

  1. The confusion is caused by the fact that TextIOWrapper.seek() has no docstring, so pydoc shows a docstring from a parent class, which is inadequate due to significant difference in semantic.
  2. The first argument of TextIOWrapper.seek() is not a byte offset, and not an offset at all. It is a "cookie", i.e. zero or an opaque number returned by tell(). Passing anything other than zero and the value returned by tell() when whence=SEEK_SET has undefined behavior.
  3. SEEK_CUR and SEEK_END only support zero as the first argument.

Neither forward seeking nor backward seeking are supported. The only supported are rewinding to the start and restoring the previous position saved by tell(). Confusion could be avoided if TextIOWrapper.tell() returned a special type instead of integer, but for now it would be a large breaking change (because the cookie can be saved in a file as an integer and used in other process).

@erlend-aasland
Copy link
Contributor

As a data point: on macOS, Linux, and FreeBSD, the lseek and fseek manpages uses offset and whence as parameter names.

@serhiy-storchaka
Copy link
Member

Because they only work with binary files. They do not work with TextIOWrapper which support multibyte stateful codecs.

@erlend-aasland
Copy link
Contributor

erlend-aasland commented Aug 14, 2023

There are a lot of tasks here. Suggesting to proceed in this order:

  1. Use whence instead of how as the name of the third position-only parameter of os.lseek. Rationale: the SEEK_* constants are also used for this function. If the SEEK_* constants are to talk about the whence parameter, we should use it for os.lseek as well.
  2. Improve the documentation of os.SEEK_SET, os.SEEK_CUR and os.SEEK_END; explain that they are used for the whence parameter of os.lseek and of the seek() method of file objects; add a very short description to each constant.
  3. Document os.SEEK_HOLE and os.SEEK_DATA (exposed in os though posix)
  4. Consistently name _io.*.seek() params where it makes sense.
  5. Improve the docs of the seek() methods (Doc/library/io.rst)
  6. Improve the docstrings of the seek() methods
@erlend-aasland
Copy link
Contributor

Because they only work with binary files. They do not work with TextIOWrapper which support multibyte stateful codecs.

Yes, that is fine; TextIOWrapper is a special case. For the other cases, it may make sense to use a consistent naming.

@erlend-aasland
Copy link
Contributor

__text_signature__ for _io types with seek:

3.11
BufferedRandom.__text_signature__: ($self, target, whence=0, /)
BufferedReader.__text_signature__: ($self, target, whence=0, /)
BufferedWriter.__text_signature__: ($self, target, whence=0, /)
BytesIO.__text_signature__: ($self, pos, whence=0, /)
FileIO.__text_signature__: ($self, pos, whence=0, /)
StringIO.__text_signature__: ($self, pos, whence=0, /)
TextIOWrapper.__text_signature__: ($self, cookie, whence=0, /)
3.12
BufferedRWPair.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
BufferedRandom.__text_signature__: ($self, target, whence=0, /)
BufferedReader.__text_signature__: ($self, target, whence=0, /)
BufferedWriter.__text_signature__: ($self, target, whence=0, /)
BytesIO.__text_signature__: ($self, pos, whence=0, /)
FileIO.__text_signature__: ($self, pos, whence=0, /)
StringIO.__text_signature__: ($self, pos, whence=0, /)
TextIOWrapper.__text_signature__: ($self, cookie, whence=0, /)
_BufferedIOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
_IOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
_RawIOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
_TextIOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
3.13
BufferedRWPair.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
BufferedRandom.__text_signature__: ($self, target, whence=0, /)
BufferedReader.__text_signature__: ($self, target, whence=0, /)
BufferedWriter.__text_signature__: ($self, target, whence=0, /)
BytesIO.__text_signature__: ($self, pos, whence=0, /)
FileIO.__text_signature__: ($self, pos, whence=0, /)
StringIO.__text_signature__: ($self, pos, whence=0, /)
TextIOWrapper.__text_signature__: ($self, cookie, whence=0, /)
_BufferedIOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
_IOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
_RawIOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
_TextIOBase.__text_signature__: ($self, offset, whence=os.SEEK_SET, /)
@serhiy-storchaka
Copy link
Member

Even in C, fseek/ftell and lseek and not interoperable. You will have issues when pass the result of lseek to fseek on Windows (because newline translation and buffering). There were such issues in the CPython interpreter. So "offset" in fseek/ftell is not just an offset.

fgetpos and fsetpos are more modern analogs of fseek/ftell, and they work with position which is not a number.

@erlend-aasland
Copy link
Contributor

Even in C, fseek/ftell and lseek and not interoperable. You will have issues when pass the result of lseek to fseek on Windows (because newline translation and buffering). There were such issues in the CPython interpreter. So "offset" in fseek/ftell is not just an offset.

fgetpos and fsetpos are more modern analogs of fseek/ftell, and they work with position which is not a number.

Are you responding to a particular suggested change of #107801 (comment)? Are you suggesting another change? I'm not sure how to interpret your post.

@serhiy-storchaka
Copy link
Member

No, it was response to #107801 (comment).

As for your proposition, it all looks reasonable. The main entries are 6 and 5. Perhaps there should be only two separate docstrings: for binary and text streams, all other classes perhaps can inherit them from parent. The module documentation can be more verbose and provide more specific details. Other entries can go to other issues.

@erlend-aasland
Copy link
Contributor

No, it was response to #107801 (comment).

Thanks for clarifying.

As for your proposition, it all looks reasonable. The main entries are 6 and 5. Perhaps there should be only two separate docstrings: for binary and text streams, all other classes perhaps can inherit them from parent. The module documentation can be more verbose and provide more specific details. Other entries can go to other issues.

That sounds reasonable. And of course, the TextIOWrapper docstring is a special case.

@erlend-aasland erlend-aasland changed the title Help on TextIOWrapper.seek is misleading Aug 14, 2023
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 14, 2023
Clearly document the supported seek() operations:

- Rewind to the start of the stream
- Restore a previous strema position (given by tell())
- Fast-forward to the end of the stream
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 14, 2023
- name the last parameter *whence*, like it is for seek() methods on
  file objects
- add param docstrings
- structure the valid *whence* values as a list
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 14, 2023
Yhg1s pushed a commit that referenced this issue Aug 19, 2023
…08136)

- name the last parameter *whence*, like it is for seek() methods on
  file objects
- add param docstrings
- structure the valid *whence* params

(cherry picked from commit dd4442c)

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
erlend-aasland added a commit that referenced this issue Aug 22, 2023
Clearly document the supported seek() operations:

- Rewind to the start of the stream
- Restore a previous stream position (given by tell())
- Fast-forward to the end of the stream
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 22, 2023
… docs (python#107933)

(cherry picked from commit 7f87ebb)

Clearly document the supported seek() operations:

- Rewind to the start of the stream
- Restore a previous stream position (given by tell())
- Fast-forward to the end of the stream
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 22, 2023
… docs (python#107933)

(cherry picked from commit 7f87ebb)

Clearly document the supported seek() operations:

- Rewind to the start of the stream
- Restore a previous stream position (given by tell())
- Fast-forward to the end of the stream
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 22, 2023
erlend-aasland added a commit that referenced this issue Aug 22, 2023
…107933) (#108264)

(cherry picked from commit 7f87ebb)

Clearly document the supported seek() operations:

- Rewind to the start of the stream
- Restore a previous stream position (given by tell())
- Fast-forward to the end of the stream
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 22, 2023
- Add param docstrings
- Link to os.SEEK_* constants
- Mention the return value in the initial paragraph
Yhg1s pushed a commit that referenced this issue Aug 22, 2023
…107933) (#108262)

(cherry picked from commit 7f87ebb)

Clearly document the supported seek() operations:

- Rewind to the start of the stream
- Restore a previous stream position (given by tell())
- Fast-forward to the end of the stream
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 27, 2023
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 27, 2023
erlend-aasland added a commit that referenced this issue Aug 27, 2023
Yhg1s pushed a commit that referenced this issue Aug 27, 2023
erlend-aasland added a commit that referenced this issue Aug 29, 2023
- Add param docstrings
- Link to os.SEEK_* constants
- Mention the return value in the initial paragraph

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 29, 2023
…ython#108268)

(cherry picked from commit 8178a88)

- Add param docstrings
- Link to os.SEEK_* constants
- Mention the return value in the initial paragraph

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Aug 29, 2023
…ython#108268)

(cherry picked from commit 8178a88)

- Add param docstrings
- Link to os.SEEK_* constants
- Mention the return value in the initial paragraph

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
erlend-aasland added a commit that referenced this issue Aug 29, 2023
… (#108656)

(cherry picked from commit 8178a88)

- Add param docstrings
- Link to os.SEEK_* constants
- Mention the return value in the initial paragraph

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Yhg1s pushed a commit that referenced this issue Aug 29, 2023
… (#108655)

(cherry picked from commit 8178a88)

- Add param docstrings
- Link to os.SEEK_* constants
- Mention the return value in the initial paragraph

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
carljm added a commit to carljm/cpython that referenced this issue Aug 30, 2023
* main:
  pythongh-108520: Fix bad fork detection in nested multiprocessing use case (python#108568)
  pythongh-108590: Revert pythongh-108657 (commit 400a1ce) (python#108686)
  pythongh-108494: Argument Clinic: Document how to generate code that uses the limited C API (python#108584)
  Document Python build requirements (python#108646)
  pythongh-101100: Fix Sphinx warnings in the Logging Cookbook (python#108678)
  Fix typo in multiprocessing docs (python#108666)
  pythongh-108669: unittest: Fix documentation for TestResult.collectedDurations (python#108670)
  pythongh-108590: Fix sqlite3.iterdump for invalid Unicode in TEXT columns (python#108657)
  Revert "pythongh-103224: Use the realpath of the Python executable in `test_venv` (pythonGH-103243)" (pythonGH-108667)
  pythongh-106320: Remove private _Py_ForgetReference() (python#108664)
  Mention Ellipsis pickling in the docs (python#103660)
  Revert "Use non alternate name for Kyiv (pythonGH-108533)" (pythonGH-108649)
  pythongh-108278: Deprecate passing the first param of sqlite3.Connection callback APIs by keyword (python#108632)
  pythongh-108455: peg_generator: install two stubs packages before running mypy (python#108637)
  pythongh-107801: Improve the accuracy of io.IOBase.seek docs (python#108268)
@hugovk
Copy link
Member

hugovk commented Nov 9, 2023

Lots of merged PRs here. 🚀

Anything still to do, or can the issue be closed?

@vadmium
Copy link
Member

vadmium commented Jul 26, 2024

Issue #82969 is open about IOBase.seek/tell vs TextIOWrapper and the offset/cookie parameter
Issue #122299 is about the byte vs character seek offsets

@hugovk hugovk added the pending The issue will be closed if no feedback is provided label Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes 3.13 bugs and security fixes docs Documentation in the Doc dir pending The issue will be closed if no feedback is provided topic-IO type-bug An unexpected behavior, bug, or error
7 participants