2

String "unsafe" comes from contenteditable="true" div to where it was pasted as image from clipboard

// neeeds to be escaped. It is HTML5 valid
String unsafe = ""<img src="" alt="">
"


org.jsoup.safety.Whitelist whitelist = Whitelist.relaxed();   

whitelist.addEnforcedAttribute("a", "rel", "nofollow"); 

String safe = Jsoup.clean(unsafe, whitelist);

//and safe becomes: "<img alt="">"
//entire src lost !?

Note: randome surrouning html has no effect. Src is lost in any case.

2 Answers 2

4

The basic problem here is that if one quick looks at relaxed here: http://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed assumes only tags are in, without attributes. Did not look into source, but here claims some attributes are also in: How to make a Jsoup whitelist to accept certain attribute content. And image is also already in and src also.

the problem that causes my src to disapear is at

preserveRelativeLinks

Which is set to false,for relaxed, hidden somewhere in JSoup code https://github.com/jhy/jsoup/issues/333

--> should be set to true:

System.out.println(Jsoup.clean("<img src='imgFile.png' />","http://www.somedomain.com", Whitelist.relaxed().preserveRelativeLinks(true)));
1
  • 3
    whitelist.addProtocols("img", "src", "http", "https", "data", "cid"); should be added as well
    – Andremoniy
    Commented Nov 17, 2016 at 11:36
2

This is how to allow basic text with inline images like src="data:image/png;base64,...":

String safe = Jsoup.clean(unsafe, Whitelist.basic()
.addTags("img")
.addAttributes("img", "height", "src", "width")
.addProtocols("img", "src", "http", "https", "data"));
1
  • 1
    This worked for me - changing the whitelist to safelist Commented Mar 24, 2023 at 9:27

Not the answer you're looking for? Browse other questions tagged or ask your own question.