I've got a css stylesheet on which I want to perform some analysis and it seemed like a good idea to use regex inside Notepad++. Now I find I can't write the regex and maybe it wasn't a good idea but, bad idea or not, I want to know how to do it.
I have an automatically-generated set of styles, labelled (mostly) block_1 to block_149. What I want to do first to extract just the information about what margin settings each style specifies, since this appears to be one of the major differences. Some are plausible, especially the early ones for headings etc, but later ones appear to reflect the complex calculations of the original Word document. You can see both in the samples below:
[Note: I've added 2 spaces at the end of every line in order to get them to display properly here - those spaces do not exist in the original code. However, the original code (imported from Sigil) does have additional spacing at the start of every line - I'm not sure whether this will come out as spaces or as a tab character - I have been trying to use the whitespace indicator to cover all the options.]
.block_8 {
background-color: #FFF;
display: block;
font-family: "Calibri", sans-serif;
font-size: 1.125em;
font-weight: bold;
line-height: 1.2;
page-break-after: avoid;
text-align: center;
padding: 0;
margin: 0 2.25pt 0 0
}
.block_9 {
border-bottom: 0;
border-top: 0;
display: block;
line-height: 1.2;
text-indent: 1.5em;
padding: 0;
margin: 0.3em 0
}
.block_10 {
background-color: #FFF;
border-bottom: 0;
border-top: 0;
display: block;
font-family: serif;
font-size: 0.75em;
line-height: 12.2pt;
text-indent: 1.5em;
padding: 0;
margin: 0.3em 0
}
...
.block_113 {
background-color: #FFF;
border-bottom: 0;
border-top: 0;
display: block;
letter-spacing: -0.1pt;
line-height: 1.2;
text-indent: 1.5em;
padding: 0;
margin: 0.3em 0 0.3em 16.1pt
}
.block_114 {
background-color: #FFF;
border-bottom: 0;
border-top: 0;
display: block;
font-family: serif;
font-size: 0.75em;
text-indent: 1.5em;
padding: 0;
margin: 0.3em 0.5pt 0.3em 0.7pt
}
There are other differences and even the later ones, just for body text, have different numbers of entries.
What I would like to do is to have a regex which I could use in the first instance to reduce each of these entries just to: Block_(number) margin: (settings)
I had thought of extracting the different margin settings (T,R,B,L), but since the source can include 1,2,3 or 4 settings, sorting out those rules by regex is beyond my ambition. I have been using regex101.com to try extending from very simple recognition using just the margin settings, but managing to include all the (variable number of) extra lines between the block number and the margin settings has stumped me. Ideally I would like to be able to use a similar regex technique to extract other settings later on. I would also like to be able to cope with variable numbers of spaces and/or tabs in the layout.
Can anyone tell me how to do this? It's got to the stage where I can almost certainly do basic cut and paste more quickly, but now I want to know how to do the regex against the time when I may need it for another project.
EtA: I now have code which will do what I asked and now want more! The settings I wanted just happened to be the last ones in the block - suppose I wanted to select the line-height settings and isolate them by a similar process - as an alternative to the margin settings?
(\..*)\{(.*)(margin:.*)\}
for the find, and\1\3
for the replace expression. since your attributes are line delimited, you have to check the box for. matches newline
.*
", this means it will match EVERYTHING up to the end of the document. Therefore, the first match, will be everything up to the last block, AND it will be the LAST and ONLY match. The reason you find all blocks, in reverse order, is presumably because you have selected the "Wrap around" option. There is a more efficient way to do this search, which (I hate to hijack, but) I'll provide as an answer.