8

From time to time in Space SE a question or answer about a calculation will need to share a significant amount of tabular data, say 100 or 1000 lines. It needs to be forever available to future readers, and the simplest (and perhaps only) way to permanently attach the data to the post is to use the code block feature which limits its display size so it doesn't dominate the post.

While there's an effort to move some smaller tables to the new markdown tables (New Feature: Table Support) it would be helpful to find those posts that contain large code blocks and make a note of them so that they don't inadvertently get converted to markdown tables that are several meters tall.

I went to the Stack Exchange Data Explorer and did a quick search for "large code blocks" to no avail.

Question: How can we find posts with particularly large code blocks in a non-code-based site?

I suppose if we could enter a number in little a box for the minimum length that would be handy; if it has to be fixed, perhaps 50 lines of continuous code block would be enough for us. For actual code most of the time it's usually just short snippets.

4
  • 2
    "make a note of them so that they don't inadvertently get converted to markdown tables that are several meters tall". Why not though? By keeping them as code you're making it inconvenient or even impossible for some people to read (and navigate through) the information, especially if there's a lot of it.
    – Laurel
    Commented May 26, 2022 at 11:39
  • 1
    @Laurel why don't you review the top ten results from the query in the answer. You will see that most of these are quite large and would make ridiculously long and useless markdown tables and nobody is going to read hundreds of lines of numbers one after another. Instead they'll copy/paste them into a local file (exactly as one would for code) for computer analysis. Further, most of these are formatted for monospace font, loosing that you destroy the readability. The idea that markdown tables are always more readable for anything in the universe is a mythology. Take a look!
    – uhoh
    Commented May 26, 2022 at 13:20
  • @Laurel - you are correct. On Space.SE there are a number of posts which incorrectly use code blocks and it breaks things completely for those with accessibility issues. Especially for those larger tables.
    – Rory Alsop
    Commented May 31, 2022 at 14:02
  • @RoryAlsop is there an existence proof of a better way to display those large tables for those with accessibility issues?
    – uhoh
    Commented May 31, 2022 at 18:25

2 Answers 2

11

Here is a simple query which shows the length of the first code block (in characters). It works fine on a site like Space Stack Exchange, I'm not sure about larger sites.

enter image description here

I have no doubt @rene can come up with something that analyses all code blocks in a post. It's a pity that STRING_SPLIT only supports single characters as separator...

10

I was challenged to provide a query that would analyze all code blocks, so here it is.

I started with the work already done by Glorfindel and then added a Recursive CTE to find the next block. That turns out to be a tad more challenging because PATINDEX lacks a start parameter. So it is down the substring rabbit hole from there. As with any recursion: you have to know when you have to stop. So I did.

declare @opentag nvarchar(20) = '<pre><code>'
declare @opentaglike nvarchar(20) = concat('%', @opentag, '%')
declare @closetag nvarchar(20) = '</code></pre>'
declare @closetaglike nvarchar(20) = concat('%', @closetag, '%')


; with CodeBlocksPerPost(id, opentagpos, closetagpos) as 
(
  select id
       , patindex(@opentaglike, Body) + len(@opentag) [opentagpos]
       , patindex(@closetaglike, Body) [closetagpos]
  from posts
  where body like @opentaglike
  union all
  select cbpp.id
       , cbpp.closetagpos 
         + patindex(
               @opentaglike
             , substring(body, closetagpos + len(@closetag), len(body))
           ) + len(@opentag) [opentagpos]
       , cbpp.closetagpos 
         + patindex(
               @closetaglike
             , substring(body, closetagpos + len(@closetag), len(body))
            ) [closetagpos]
  from CodeBlocksPerPost cbpp 
  inner join posts on posts.id = cbpp.id
  where patindex(@closetaglike
          , substring(body
               , closetagpos + len(@closetag)
               , len(body))
         ) > 0
),
CodeLinesPerBlock as (
  select codeblocksperpost.id
       , opentagpos
       , closetagpos
       , ( closetagpos 
         - opentagpos 
         - len(replace(
              substring(
                  body
                , opentagpos
                , closetagpos - opentagpos)
            , nchar(10)
            , ''))
         ) [lines]
  from CodeBlocksPerPost
  inner join posts on posts.id = CodeBlocksPerPost.id
)


select top 1000 
       id [Post Link]
     , count(*) [# of blocks]
     , max(lines) [Max # of lines] 
     , max(closetagpos - opentagpos) [Max length]
     , min(closetagpos - opentagpos) [Min length]
     , sum(closetagpos - opentagpos) [Total length]
from CodeLinesPerBlock
group by id
order by max(closetagpos - opentagpos) desc

As we now have potentially more than one row per post we can max, min and sum the length of each code block. That renders this result today:

image of table with post links, count, max # of lines, max, min, total length

Keep in mind SEDE is updated once a week on Sunday.
Use the awesome SEDE Tutorial written by the unforgettable Monica Cellio.
Say "Hi" in SEDE chat.

3
  • Thank you for your query and advice, and I will do all three! :-) I'm frightened of SEDE because it doesn't look like Python, but perhaps I can summon the courage to try now. Since it determines the vertical size of the markdown table which no longer benefits from the scroll bar, I wonder if the number of lines can be displayed as well, perhaps by counting the number of end-of-line or <CR> characters in the block?
    – uhoh
    Commented May 26, 2022 at 23:18
  • 2
    Oh Monica's tutorial is incredible! Even I can understand it.
    – uhoh
    Commented May 27, 2022 at 0:00
  • 2
    @uhoh I've revisited the query and now also added the number of lines in a code block
    – rene
    Commented May 27, 2022 at 9:58

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .