Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

maintenance(docs): Add retext-spell to check spelling #24372

Merged
merged 67 commits into from
Jun 24, 2020
Merged

Conversation

tesseralis
Copy link
Contributor

@tesseralis tesseralis commented May 23, 2020

Description

Add retext-spell in order to spell-check our docs and fix a couple of spelling errors. Also add an update-dictionary script to make adding new words to the dictionary easier.

Motivation

Being able to auto-check words means that we can keep the Gatsby Style easier.

Method

I created dictionary.txt by running retext-spell once and then extracting/sorting the "mispelled" words. I looked over the file by hand to find any obvious misspellings, brand name capitalizations, and locale-specific spellings and applied them. There's more stuff that we can improve/regularize, such as putting filenames and code in backticks, but I can leave it to follow-up PRs and community maintainers.

TODO

  • Make sure that emoji are spell-checked correctly.
  • Script to update dictionary if new words are used.

Caveats

  • If "new words" are used, they have to be added to the dictionary. I'll write a script to enable that but it may get cumbersome having to constantly update the dictionary with new words (or it might catch some misspellings more, who knows!)

Follow-up PRs

  • Backticks on function names like useStaticQuery
  • Backticks on package names (e.g. gatsby-source-filesystem)
  • Standardize more brands and term names:
    • Gatsby/GatsbyJS
    • GraphQL
    • "CMS" (we have like five different ways we pluralize CMS)
@tesseralis tesseralis requested review from a team as code owners May 23, 2020 01:39
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label May 23, 2020
@muescha
Copy link
Contributor

muescha commented Jun 14, 2020

@muescha

maybe we split up the dictionary into "normal" words and brand names? that makes more easy to contribute to some parts - and for each part is better to check.

I was thinking this too and would probably do it in a follow-up PR. One thing we have to consider is the workflow for adding new words to the dictionary. Right now the update-dictionary script adds all new words to dictionary.txt, and it would take more manual effort to move people names/brands/etc. to separate files.

if it splitted up, then the script can save new found words in a file: new-misspellings.txt which can manually spread over the different files. so the script would not touch existing dictionary files

Titus (@wooorm) sayed the retext-spell implemented the hunspell dictionaryformat, where some things can be better fine tuned?

We can definitely fine-tune some of these names. I tried to standardize as much as I could, left some things out because there were many inconsistencies and they would be better put in another PR (for example, the many spellings of Gatsby/GatsbyJS and GraphQL). Since you've been so helpful so far with making things consistent, I was hoping you would be able to help us out with those :)

yes - i like to help out :)

if the dictionary not helps that much we can also catch the some hard to defined word combinations with the remark-lint-prohibited-strings for example i see here the "always lovercase npm / webpack"?

@muescha
Copy link
Contributor

muescha commented Jun 14, 2020

is there a rule to ignore twitter handles like @iam_timsmith?

@wooorm
Copy link

wooorm commented Jun 14, 2020

@muescha there’s retext-syntax-mentions, see: remarkjs/remark-retext#13 (comment)

# Conflicts:
#	docs/blog/2020-06-16-Gatsby-slideshow-with-Posenet/index.md
@tesseralis
Copy link
Contributor Author

if the dictionary not helps that much we can also catch the some hard to defined word combinations with the remark-lint-prohibited-strings for example i see here the "always lovercase npm / webpack"?

I think if you use hunspell's syntax you can make sure npm and webpack are lowercase:

webpack
*Webpack
npm
*NPM

is there a rule to ignore twitter handles like @iam_timsmith?

We have retext-syntax-mentions installed and configured but for some reason it's not picking up @iam_timsmith as a mention?

@muescha
Copy link
Contributor

muescha commented Jun 17, 2020

We have retext-syntax-mentions installed and configured but for some reason it's not picking up @iam_timsmith as a mention?

thats a twitter handle which allow underscore which is not catched with the github setting of retext-syntax-mentions

since this plugin not handle multiple styles (a feature request it is not possible to add multiple configs

multiple config are not possible

but this "hacky hack" works on my machine:

  .use(() => require("retext-syntax-mentions")({ style: 'twitter'}))
  .use(() => require("retext-syntax-mentions")({ style: 'github'}))

but this hack is not recommended by the unify team

@muescha
Copy link
Contributor

muescha commented Jun 17, 2020

btw i find out that many of the mentions are triggered by mdx import statements with adding a

vFile.info(`formatting "${toString(slice)}" as mention`, newPosition, 'retext-syntax-mentions');

right after:

https://github.com/retextjs/retext-syntax-mentions/blob/66dc99aa0af3876429d69b6c181a0b0c8b26525e/index.js#L61-L68

but the team recommend for MDX files the remark-mdx plugin ( retextjs/retext-syntax-mentions#7 (comment)):

Well, @component/something-different is a valid GitHub mention: it mentions the something-different team in the component organization, so if that behavior is not what you want, then I’d suggest using twitter or a custom regex.

If you want to check MDX files: those are fundamentally different than Markdown, and becoming more so in MDX@2. You may want use remark-mdx to support the MDX-only/non-Markdown syntax as syntax. But you may need to use separate processors: one for regular Markdown, that doesn‘t support imports, and one for MDX!

@muescha
Copy link
Contributor

muescha commented Jun 17, 2020

BTW: is it possible to deactivate via label or something the Typo CI / TypoCheck plugin for each commit? sometimes it makes to much noise

@tesseralis
Copy link
Contributor Author

@muescha I definitely want to add remark-mdx but right now it's causing some linting because the MDX parser doesn't recognize <!-- lint-ignore comments. I want to get to the bottom of it and include the MDX parser in a different PR.

As for Typo CI, I was hoping we could get rid of it once this PR merges. Spell checks are most useful on text and documentation, not package names, so it would make sense to use retext-spell instead.

@muescha
Copy link
Contributor

muescha commented Jun 18, 2020

BTW: this is a better setting for mentions:

.use(require("retext-syntax-mentions"), {style: /^@[\w-]{1,40}$/})

i have written a small tracer which checks the tree diffs - and there i see that the github setting are mostly catching mdx imports like @compontents/nav-bar

@muescha muescha changed the title mainenance(docs): Add retext-spell to check spelling Jun 23, 2020
@muescha
Copy link
Contributor

muescha commented Jun 23, 2020

@muescha I definitely want to add remark-mdx but right now it's causing some linting because the MDX parser doesn't recognize <!-- lint-ignore comments. I want to get to the bottom of it and include the MDX parser in a different PR.

i am not sure, but maybe this util helps with the lint-ignore messages?

https://github.com/unifiedjs/unified-message-control

and this based on the previous util:

https://github.com/remarkjs/remark-message-control

//in index.js
var test = [
  'html', // Comments are `html` nodes in mdast.
  'comment' // In MDX, comments have their own node.
]
@muescha
Copy link
Contributor

muescha commented Jun 23, 2020

As for Typo CI, I was hoping we could get rid of it once this PR merges. Spell checks are most useful on text and documentation, not package names, so it would make sense to use retext-spell instead.

i was looking for an option to only have changed lines marked, but there is not this option.

in retext on CI we can use this tool for checking only changed lines: https://github.com/unifiedjs/unified-diff

Copy link

@AishaBlake AishaBlake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! Seems like this allows us to remove the spelling bot as well? That can be a "help wanted" issue as far as I'm concerned.

@AishaBlake AishaBlake merged commit 9b3faa2 into master Jun 24, 2020
@delete-merged-branch delete-merged-branch bot deleted the retext-spell branch June 24, 2020 19:53
@tesseralis
Copy link
Contributor Author

@muescha this PR is merged in so feel free to start working on other PRs to standardize names and stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: documentation An issue or pull request for improving or updating Gatsby's documentation type: maintenance An issue or pull request describing a change that isn't a bug, feature or documentation change
5 participants