What proportion of this text would they be able to release as preview
or demonstration of the material without it being considered copyright
infringement?
That proportion of the full text of a work that you decide to use should be determined after you consult Fair Use | U.S. Copyright Office; it's going to be the research team's judgement call as to how much to use in order to fall under different types of use under the umbrella of Fair Use.
Fair Use is a legal doctrine that promotes freedom of expression by
permitting the unlicensed use of copyright-protected works in certain
circumstances. Section 107 of the Copyright Act provides the statutory
framework for determining whether something is a fair use and
identifies certain types of uses—such as criticism, comment, news
reporting, teaching, scholarship, and research—as examples of
activities that may qualify as fair use. Section 107 calls for
consideration of the following four factors in evaluating a question
of fair use:
Purpose and character of the use, including whether the use is of a
commercial nature or is for nonprofit educational purposes....
Nature of the copyrighted work: This factor analyzes the degree to
which the work that was used relates to copyright’s purpose of
encouraging creative expression....
Amount and substantiality of the portion used in relation to the
copyrighted work as a whole: Under this factor, courts look at both
the quantity and quality of the copyrighted material that was used....
Effect of the use upon the potential market for or value of the
copyrighted work: Here, courts review whether, and to what extent,
the unlicensed use harms the existing or future market for the
copyright owner’s original work....
If you are the subject of legal action by the copyright holder with a claim of copyright infringement because they feel you used more than Fair Use, you will no doubt plead some aspect of fair use in court as your defense. The differences between parties in the amount and significance of usage under Fair Use are ultimately decided by a court.
One strategy to avoid legal problems is to contact each copyright holder and get explicit permission to use specified amounts of text; that solves the issue of you making your own decisions - and also legal liability issues - about what might be Fair Use.
As for Google, they use a legal precedent to scan and use copyrighted material for their service, though not everyone is happy about it, and that's why they were in court for years; read Google Books just won a decade-long copyright fight - The Washington Post.