2

I'm having trouble structuring the output of a Netezza query. I'm working with web data and need to be able to count the distinct URL's that are receiving traffic. The issue is that most of the rows contain URL parameters that need to be stripped before counting.

This is what my URL's look like:

  • www.site.com/page?utm_source=xxx&utm_campaign=yyy
  • www.site.com/another?cmp_code=zzz&cmp_vendor=aaa
  • www.site.com/page?cmp_code=nnn&cmp_vendor=bbb
  • www.site.com/another
  • www.site.com/something

What I'd like to return is something like this:

  • www.site.com/page
  • www.site.com/another
  • www.site.com/page
  • www.site.com/another
  • www.site.com/something

My query is below. It runs without error, but its not producing the results that I'd like.

SELECT UID, TIMESTAMP, SUBSTR(FULL_URL, 1, INSTR(FULL_URL, '?', -1) AS MY_URL
FROM MY.TABLE
LIMIT 100;

This is the result

  • www.site.com/page
  • www.site.com/another
  • www.site.com/page
  • NULL
  • NULL

You can see here that if my URL contains a question mark, I'm getting the exact result that I want. If it does not contain URL parameters then it returns nothing at all.

I've also tried a regex approach, but this is currently giving me a Count field incorrect error.

SELECT UID, TIMESTAMP, REGEXP_REPACE(FULL_URL, (\.\+)(\/?)(\.\*), $1) AS MY_URL
FROM MY.TABLE
LIMIT 100;

I'm still new to Netezza and I'm not seeing the logic or operators that I need to accomplish this. Can anyone point me in the right direction?

2
  • 1
    try regexp_extract(full_url,'^[^?]+') Commented Dec 8, 2016 at 22:50
  • This does exactly what I need. Thanks for the assist!
    – Lenwood
    Commented Dec 8, 2016 at 23:02

1 Answer 1

2

This should work:

create temp table myTable ( Full_URL varchar(5000));
insert into myTable values('www.site.com/page?utm_source=xxx&utm_campaign=yyy');
insert into myTable values('www.site.com/another?cmp_code=zzz&cmp_vendor=aaa');
insert into myTable values('www.site.com/page?cmp_code=nnn&cmp_vendor=bbb');
insert into myTable values('www.site.com/another');
insert into myTable values('www.site.com/something');  

select regexp_replace(Full_URL,'\?.*','')
from myTable
0

Not the answer you're looking for? Browse other questions tagged or ask your own question.