0

Looking to download the URL's for the Images under the Fire Fox Media Tab under the View Page Info preferably using power shell. Not sure if this is possible or if there is a better way to do this.

enter image description here

2 Answers 2

2

Some web sites/properties or segments thereof do not allow/prevent automation and there is nothing that can be done about that.

Btw, you don't need a browser to download website data, this is known as web scraping of course and this is done using the PowerShell web cmdlets, specifically...

# Get specifics for a module, cmdlet, or function
(Get-Command -Name Invoke-WebRequest).Parameters
(Get-Command -Name Invoke-WebRequest).Parameters.Keys
<#
# Results

UseBasicParsing
Uri
WebSession
SessionVariable
Credential
UseDefaultCredentials
CertificateThumbprint
Certificate
UserAgent
DisableKeepAlive
TimeoutSec
Headers
MaximumRedirection
Method
Proxy
ProxyCredential
ProxyUseDefaultCredentials
Body
ContentType
TransferEncoding
InFile
OutFile
PassThru
Verbose
Debug
ErrorAction
WarningAction
InformationAction
ErrorVariable
WarningVariable
InformationVariable
OutVariable
OutBuffer
PipelineVariable
#>
Get-help -Name Invoke-WebRequest -Examples
<#
# Results

$R = Invoke-WebRequest -URI 
$R.AllElements | where {$_.innerhtml -like "*=*"} | Sort { 
values. Sorting by the shortest HTML value often helps you find the     
$R=Invoke-WebRequest http://www.facebook.com/login.php
$FB
$Form = $R.Forms[0]
$Form | Format-List
$Form.fields
$Form.Fields["email"]="[email protected]"
$R=Invoke-WebRequest -Uri ("https://www.facebook.com" +
# Sends a sign-in request by running the Invoke-WebRequest 
$R.StatusDescription
(Invoke-WebRequest -Uri "http://msdn.microsoft.com/en-us/library 
#>
Get-help -Name Invoke-WebRequest -Full
Get-help -Name Invoke-WebRequest -Online

So, for the URL you say you are tyring to hit, note you get results for ...

# Download website main page
($InstacartHomeData = Invoke-WebRequest -Uri 'https://www.instantcart.com')

<#
# Results

StatusCode        : 200
StatusDescription : OK
Content           : <!DOCTYPE html><html lang="en" class="no-js"><head><link rel="alternate"
                    href="https://www.instantcart.com/" hreflang="en-gb" /><link rel="alternate"
                    href="https://www.instantcart.com/" hreflang="en" ...
RawContent        : HTTP/1.1 200 OK
                    Pragma: no-cache
                    Vary: Accept-Encoding
                    Connection: close
                    Transfer-Encoding: chunked
                    Cache-Control: private, no-cache, no-store, proxy-revalidate, no-transform
                    Content-Type: text/...
Forms             : {}
Headers           : {[Pragma, no-cache], [Vary, Accept-Encoding], [Connection, close], [Transfer-Encoding, chunked],
                    [Cache-Control, private, no-cache, no-store, proxy-revalidate, no-transform], [Content-Type,
                    text/html], [Date, Thu, 28 May 2020 04:23:28 GMT], [Expires, Thu, 19 Nov 1981 08:52:00 GMT],
                    [Set-Cookie, sid=b806f71e100b9f2d4d1037561b53ff65; path=/; domain=www.instantcart.com], [Server,
                    Apache], [X-Powered-By, PHP/5.5.38]}
Images            : {@{innerHTML=; innerText=; outerHTML=<img width="160" class="img-responsive"
...
#>

# Get only images data
$InstacartHomeData.Images | Select-Object alt, src

<#
# Results

alt                                       src
---                                       ---
                                          /pics/logo.png
Abode Home Products                       /images/home/clients/abode-home-products.png
Avanta UK                                 /images/home/clients/avanta-uk.png
Q-Park                                    /images/home/clients/qpark.png
...
#> 

Now, make the same attempt for your target page.

# Download website specific main page
($InstacartProductPageData = Invoke-WebRequest -Uri 'https://www.instacart.com/products/98954-poland-spring-natural-spring-water-2-5-gal')
<#
# Results

# Cookie are used to get this

StatusCode        : 200
StatusDescription : OK
Content           : <!DOCTYPE html>
                    <html lang='en'>
                    <head>
                    <title>
                    Poland Spring Natural Spring Water (2.5 gal) - Instacart
                    </title>
                    <meta content='Buy Poland Spring Natural Spring Water (2.5 gal) online and have it de...
RawContent        : HTTP/1.1 200 OK
                    Transfer-Encoding: chunked
                    Connection: keep-alive
                    X-Frame-Options: SAMEORIGIN
                    X-XSS-Protection: 1; mode=block
                    X-Content-Type-Options: nosniff
                    X-Download-Options: noopen
                    X-Permit...
Forms             : {}
Headers           : {[Transfer-Encoding, chunked], [Connection, keep-alive], [X-Frame-Options, SAMEORIGIN],
                    [X-XSS-Protection, 1; mode=block]...}
Images            : {@{innerHTML=; innerText=; outerHTML=<img class="rmq-569a8dd6" style="background: rgb(255, 255,

...
                    Poland Spring 100% Natural Spring Water

                    2.5 gal; outerHTML=<a style="text-decoration: none;"
                    href="/products/16965376-poland-spring-100-natural-spring-water-2-5-gal" data-radium="true"><div
                    class="rmq-cd8b1370 rmq-5e34cd3" style="padding: 0px 16px; width: 208px; height: 100%; text-align:
                    left; line-height: 1.29; font-size: 14px; display: flex; position: relative; opacity: 1;
                    flex-direction: column;" data-radium="true"><div class="rmq-24058c4e" style="width: 176px; height:
                    176px;" data-radium="true"><img style="width: 100%; display: block;" alt="" src="https://d2d8wwwkmh
                    fcva.cloudfront.net/352x/d1s8987jlndkbs.cloudfront.net/assets/missing-item-4bbe82b8555e4d1c12626fd4
                    82cb2409713e8e30835645ff3650ef66a725d03c.png" data-radium="true"></div><div style="padding-bottom:
                    8px; margin-top: auto;" data-radium="true"><div class="rmq-50e196af" style="color: rgb(66, 66,
                    66); overflow: hidden; margin-top: 20px; -ms-text-overflow: ellipsis; max-height: 55px;"
                    data-radium="true">Poland Spring 100% Natural Spring Water</div><div style="color: rgb(117, 117,
                    117);" data-radium="true"><span>2.5 gal</span></div></div></div></a>; outerText=
...    

#>

# Get only images data
$InstacartProductPageData.Images | Select-Object alt, src
<#
# Results

    alt                                        src
---                                        ---
Instacart logo                             https://d2guulkeunn7d8.cloudfront.net/assets/beetstrap/brand/carrotlogo-p...
Poland Spring Natural Spring Water         https://d2lnr5mha7bycj.cloudfront.net/product-image/file/large_f44f2f09-b...
Gala Fresh logo                            https://d2lnr5mha7bycj.cloudfront.net/warehouse/logo/162/0f5c96be-4126-45...
...
#>
1

Please see the below that uses internet explorer to render the page, the image locations are then stored within the document property.

Adjust the output directory and the website to what you need.

I have not tested that the results of this are the same as what firefox lists but it is very likely to produce the same.

$OutputDirectory = "c:\test\images.txt" # change this to the output directory and txt file name, ensure it ends with .txt
$Weppage = "https://www.somewebsite.com" # change this to the webpage you want

$ieObject = New-Object -ComObject 'InternetExplorer.Application'
$ieObject.Visible = $false
$ieObject.Navigate($Weppage)
while($ieObject.ReadyState -ne 4) {start-sleep -m 100}
$images = $ieObject.Document.images | % {$_.src}
$images | Out-file $OutputDirectory
$ieObject.quit()
1
  • This works very well on most sties however I am having issues with it on this site: instacart.com/products/… not sure what would cause it not to work. Commented May 27, 2020 at 18:26

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .