Skip to main content
19 events
when toggle format what by license comment
Apr 9, 2019 at 19:01 comment added axolotl Let us continue this discussion in chat.
Apr 9, 2019 at 19:01 comment added Putvi No, that isn't true. You can do whatever your program does to the links before hand.
Apr 9, 2019 at 19:00 comment added axolotl now imagine there are 2 million such links. that would take you weeks to retrieve all of them. you cannot begin any research work until you have all of them downloaded.
Apr 9, 2019 at 18:59 comment added Putvi Yes, but if your software pre-vetted the link, that is just clicking a button.
Apr 9, 2019 at 18:59 comment added axolotl Suppose I gave you that link. You would still need to retrieve plain text from wherever that link points to, through an http request.
Apr 9, 2019 at 18:57 comment added Putvi No, you don't understand what I am saying :). YOU do the crawling and keep a database of what it says. You don't present that to the user though. It is only used to decide what to present. You then show a link instead of the text. No extra crawling is needed.
Apr 9, 2019 at 18:57 comment added axolotl Another option would be to make anyone who downloads the data from the research team acknowledge to fair use terms. do you think that would be in accordance with fair use law?
Apr 9, 2019 at 18:56 comment added axolotl I understand what you're saying. However the important part is that crawling takes really long and a lot of resources, more than you would assume. Just providing a link would mean anybody else would have to put in exactly the same amount of time and effort into retrieving the same data, so it is preferable to just hand the data out in plain, or en claire, to whoever would like to make fair use of it
Apr 9, 2019 at 18:53 comment added Putvi I meant that you can crawl it yourself and then link to the relevant parts then it is on Google books and not you. If your program would display text if it could you show a link in place of that text.
Apr 9, 2019 at 18:51 comment added axolotl the issue with that option would be that crawling the data is computationally expensive and not everyone would have the equipment or time to do that
Apr 9, 2019 at 18:50 comment added Putvi Or just link to the data.
Apr 9, 2019 at 18:49 comment added axolotl However, a middle ground could be to publish only parts of the data so as not to expose the original works in full
Apr 9, 2019 at 18:43 comment added axolotl Ideally, the research team would want to make the data set public so that everyone in the field working on this research would benefit from it
Apr 9, 2019 at 18:32 comment added Putvi Is the data set private since it is used for research?
Apr 9, 2019 at 18:19 history edited David Siegel
edited tags
Apr 9, 2019 at 18:19 answer added David Siegel timeline score: 3
Apr 9, 2019 at 17:24 answer added BlueDogRanch timeline score: 1
Apr 9, 2019 at 17:05 review First posts
Apr 9, 2019 at 19:19
Apr 9, 2019 at 17:04 history asked axolotl CC BY-SA 4.0