Skip to content
This repository has been archived by the owner on Jul 9, 2023. It is now read-only.

Wrapper to strip pseudo-JSON comments from an io::Read input stream #24

Closed
dtolnay opened this issue Jan 24, 2019 · 10 comments · Fixed by #36
Closed

Wrapper to strip pseudo-JSON comments from an io::Read input stream #24

dtolnay opened this issue Jan 24, 2019 · 10 comments · Fixed by #36

Comments

@dtolnay
Copy link
Owner

dtolnay commented Jan 24, 2019

Occasionally people ask for JSON deserialization with comments, without wanting to go all the way to JSON5 or Hjson.

This doesn't belong in serde_json which is intended strictly for JSON as described by https://json.org/.

I would like to be able to write:

use std::error::Error;
use std::fs::File;
use std::io::BufReader;

use serde::Deserialize;
use strip_json::StripComments;

fn main() -> Result<(), Box<Error>> {
    let input = File::open("path/to/input.json")?;
    let reader = BufReader::new(input);
    let clean = StripComments::new(reader);
    let mut de = serde_json::Deserializer::from_reader(clean);

    println!("{:#?}", serde_json::Value::deserialize(&mut de));

    Ok(())
}

Roughly the API would be:

pub struct StripComments<R>;

impl<R> StripComments<R> {
    pub fn new(stream: R) -> Self;
}

impl<R: Read> Read for StripComments<R>;
@bolinfest
Copy link

Although I appreciate the proposed separation of concerns, you basically have to parse the JSON file twice, right? For example, to implement StripComments, you can't just look at each token in isolation because you might be inside of a string literal where // should be kept rather than discarded.

If StripComments has to do all the work of a JSON parser, then why not just make it a lenient JSON parser in its own right rather than a preprocessor of serde_json?

@dtolnay
Copy link
Owner Author

dtolnay commented Jan 24, 2019

you basically have to parse the JSON file twice, right?

I don't think this is the case. The only relevant piece of state for the wrapper is whether it is inside of a string literal or not. There doesn't need to be any parsing of { } [ ], or numbers, or not even escape sequences beyond \\ and \".

I think the amount of parsing dedicated to " and \ will be less than the amount required to correctly handle // and \n and /* */.

@bolinfest
Copy link

The only relevant piece of state for the wrapper is whether it is inside of a string literal or not.

I guess that's true if the JSON is well-formed. If it isn't, I suspect yielding a proper parse error becomes trickier.

But also: what about the next logical feature, which is trailing commas? I consider it strongly tied to comments because if you don't support trailing commas and you have this:

[
  "foo",
  "bar",
  "baz"
]

and you comment a single line:

// This doesn't parse anymore!
[
  "foo",
  "bar",
//  "baz"
]

For JSON files that may be edited by hand where comments are supported, it's pretty common to see people prefer trailing commas (even though they're optional) to make this sort of thing easier.

@dtolnay
Copy link
Owner Author

dtolnay commented Jan 24, 2019

I would leave the error messages to serde_json rather than diagnosing syntax errors in the wrapper.

Implementation note: we should sub out the comments with spaces, rather than removing them entirely, so that serde_json's error reporting of the line and column position will accurately apply to the input data.

For trailing commas, it still only needs the "am I in a string" amount of state. If not inside of a string, then , + whitespace + ] is emitted as ], , + whitespace + } is emitted as }. This amount of parsing is inherent to handling trailing commas so you are not reproducing any work from serde_json.

@bolinfest
Copy link

Sort of. For example: I would say that [,] should not be considered valid "JSON with comments." I think the heuristics you are proposing would rewrite that as [ ], which would parse.

@dtolnay
Copy link
Owner Author

dtolnay commented Jan 24, 2019

Ah, good call. If it's simpler to fork serde_json and integrate these changes into the existing parsing code, I would go ahead with that.

@bolinfest
Copy link

@dtolnay That sounds fair, thanks!

I discovered that there is also https://crates.io/crates/json5, but it depends on https://crates.io/crates/pest so it can use a PEG. I was trying to avoid additional dependencies, but perhaps I should give that a shot.

Also, https://crates.io/crates/json5 offers from_str, but not from_reader, which is one thing I really like about serde_json! Nice work with your IoRead to wrap io::Read to facilitate that!

@tmccombs
Copy link

I put together a simple crate for this: https://crates.io/crates/json_comments

@dtolnay
Copy link
Owner Author

dtolnay commented Jun 30, 2019

Thanks @tmccombs, this is terrific!

I filed several issues to follow up on the API and documentation.

@bolinfest
Copy link

Oh, perhaps I should have mentioned that I did end up making that fork:

https://github.com/bolinfest/serde-jsonrc

As explained in the README, it supports // and /* style comments as well as trailing commas.

Repository owner locked and limited conversation to collaborators Dec 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
3 participants