fix: document truncation and loss in notion document sync #5631

Aurelius-Huang · 2024-06-26T09:08:40Z

Description

notion extractor only retrieves the first page of many blocks, and the subsequent blocks are lost.

According to the introduction of Pagination in the Notion Developers document, when the number of Blocks contained in a Pagination exceeds 100, it is necessary to obtain them in a paginated manner to get the complete content of the Notion Page.

However, the acquisition method in notion_extractor.py can only successfully obtain the first blocks page of the Notion Page (up to 100). It is not difficult to find out from the Notion Developers document that the reason is that when calling https://api.notion.com/v1/blocks/{block_id}/children, the start_cursor of the next page is mistakenly passed as block_id, while in fact start_cursor is passed through the Query Params of the GET request.

In addition, the parameter transmission method of the Query Params of the GET request is also wrong (formal parameter: json -> params).

Fixes # (issue)

Type of Change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Find a longer Notion Page (with more than 100 Blocks) and perform the Sync from Notion operation in Knowledge to verify that the version after this PR can synchronize the complete Notion Page content, while the previous version can only obtain the content of the first 100 Blocks, and the other content is lost.

TODO

Suggested Checklist:

I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
optional I have made corresponding changes to the documentation
optional I have added tests that prove my fix is effective or that my feature works
optional New and existing unit tests pass locally with my changes

…nd the subsequent blocks are lost.

Aurelius-Huang · 2024-07-03T08:19:58Z

@JohnJyong Could you please review this PR? Is the problem described in this PR not clear enough to reproduce the problem?

JohnJyong · 2024-07-05T03:48:04Z

LGTM ， thanks for you contribution @Aurelius-Huang

fix: notion extractor only retrieves the first page of many blocks, a…

45cc5c5

…nd the subsequent blocks are lost.

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. 🐞 bug Something isn't working labels Jun 26, 2024

crazywoola requested a review from JohnJyong June 26, 2024 09:20

JohnJyong approved these changes Jul 5, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jul 5, 2024

JohnJyong merged commit f546db5 into langgenius:main Jul 5, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: document truncation and loss in notion document sync #5631

fix: document truncation and loss in notion document sync #5631

Aurelius-Huang commented Jun 26, 2024 •

edited

Loading

Aurelius-Huang commented Jul 3, 2024

JohnJyong commented Jul 5, 2024

fix: document truncation and loss in notion document sync #5631

fix: document truncation and loss in notion document sync #5631

Conversation

Aurelius-Huang commented Jun 26, 2024 • edited Loading

Description

Type of Change

How Has This Been Tested?

Suggested Checklist:

Aurelius-Huang commented Jul 3, 2024

JohnJyong commented Jul 5, 2024

Aurelius-Huang commented Jun 26, 2024 •

edited

Loading