5

I have this which is really pleasant to the eye, but I'm concerned about its implications:

#[derive(Eq, PartialEq, Debug)]
pub enum SmtpHost {     
    DOMAIN(String),
    IPV4(Ipv4Addr),
    IPV6(Ipv6Addr),
    UNKNOWN { label:String, literal:String },
}

I'm filling this up from a PEG grammar which gives me &str so all the stringy calls look like this - SmtpHost::Domain(s.to_string())

I would like these enums to be the outcome of the parser, like smtp_parser::host< 'input >(s: 'input & str) -> SmtpHost

I have also tried the ref approach, but that starts getting clumsy rather soon:

#[derive(Eq, PartialEq, Debug)]
pub enum SmtpHost<'a > {     
    DOMAIN(&'a str),
    IPV4(Ipv4Addr),
    IPV6(Ipv6Addr),
    UNKNOWN { label:&'a str, literal:&'a str },
}

So I'm like either / or ... but you know better. Tell me :o)

My study project for reference

8
  • 5
    Side note: Enum variants in rust are very typically written in CamelCase, not FULLCAPS. You might have a specific reason to do this, in which case I apologize. But if not, just stick with the convention :)
    – Kroltan
    Commented May 18, 2017 at 18:14
  • 1
    &str is not owned, so if you want to be able to keep your tokens around after the parser finishes, you probably have to use String
    – zstewart
    Commented May 18, 2017 at 18:58
  • Thanks @Kroltan, will do Commented May 18, 2017 at 19:39
  • @zstewart, do I take it for an answer? I would like these enums to be the outcome of the parser, like smtp_parser::host< 'input >(s: 'input & str) -> SmtpHost Commented May 18, 2017 at 19:48
  • @RobertCutajar-Robajz if you return SmtpHost using &str with that lifetime parameter then the returned SmtpHost would have to have a lifetime which is <= 'input, since it the signature would have to be: smtp_parser::host<'input>(s: &'input str) -> SmtpHost<'input>. Just use String unless you have a specific reason that these values should be borrowing part of a different string; it looks to me like they should own the matched values.
    – zstewart
    Commented May 18, 2017 at 20:22

2 Answers 2

5

The critical difference between &str and String is ownership. String is owned, but &str is borrowed. If you store a &str value, the container's lifetime will be limited to the lifetime of the borrowed string.

If your parser generator produces a parse function with a signature like this:

smtp_parser::host<'a>(&'a str) -> SmtpHost<'a>

then when it passes you an &str for you to use to construct your parse tree/parsed value, it most likely gives you a substring of the input. This means that the &str you are storing in your SmtpHost enum must have a lifetime shorter than the original input string. And indeed, you can see this in the signature; both the input string and output SmtpHost have lifetime parameter 'a.

This means that your resulting SmtpHost cannot outlive the input used to generate it. If the input is a string constant, &'static str, that might be fine, but if you get the input from standard in or reading a file, you won't be able to return the SmtpHost past the point where the input string is owned.

For example, suppose that you wanted to declare a function that parsed an SmtpHost from standard in:

fn read_host<'a>() -> SmtpHost<'a> {
    let mut line = String::new();
    let stdin = io::stdin();
    stdin.lock().read_line(&mut line).expect("Could not read line");
    smtp_parser::host(&line)
}

You'll get an error saying something like "line does not live long enough". Here's a trivial example in Rust playground.

So you should use &str when you are just borrowing a value from somewhere else which does not need to outlive the source. You should use String when you need to have ownership of the value.

For more complex situations where you need to have an owned value but want to be able to use it in multiple places without having many copies of it, for that there's Rc<T> and Rc<RefCell<T>. But in your case, it sounds like SmtpHost should just have ownership of the string it stores.

2
  • Thanks @zstewart, side note, the rust playground link won't pass my browser, try shrinking it, but I get the point. Commented May 20, 2017 at 7:22
  • @RobertCutajar-Robajz hm. Odd, it seems to work for me. I initially tried to use the link shortener from Rust playground, but StackOverflow apparently doesn't like link shorteners. Unfortunately the full URL includes the complete code of the example... perhaps it's longer than your browser is happy with.
    – zstewart
    Commented May 20, 2017 at 11:59
3

If you want to parse without copying, then the signature you'd want is:

// Notice that the 'input goes after the &. Syntax.
fn smtp_parser::host<'input>(s: &'input str) -> SmtpHost<'input>;

Then you could define your enum like this:

#[derive(Eq, PartialEq, Debug)]
pub enum SmtpHost<'input> {
    DOMAIN(&'input str),
    IPV4(Ipv4Addr),
    IPV6(Ipv6Addr),
    UNKNOWN { label: &'input str, literal: &'input str },
}

On the other hand, if this is too awkward in some cases, you sort of do both using the Cow (copy-on-write) type:

use std::borrow::Cow;
#[derive(Eq, PartialEq, Debug)]
pub enum SmtpHost<'input> {
    DOMAIN(Cow<'input, str>),
    IPV4(Ipv4Addr),
    IPV6(Ipv6Addr),
    UNKNOWN { label: Cow<'input, str>, literal: Cow<'input, str> },
}

This is what you want to do if the host parts can sometimes be used directly out of the input, but sometimes needs to be changed before it's usable.

1
  • Thanks for the Cow @notriddle Commented May 20, 2017 at 7:17

Not the answer you're looking for? Browse other questions tagged or ask your own question.