from bs4 import BeautifulSoup
import requests
ua = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
url = 'http://i984.photobucket.com/albums/ae321/isaacscr/Misc/HPIM5242.jpg'
headers = {'Upgrade-Insecure-Requests': '1', 'User-Agent': ua, 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'DNT':'1'}
requests.get(url, headers=headers)
IT WORKS?!?!? Before putting in all those headers, it just gave me that blasted "give us your money" image. This is the first step. Making it redirect to the actual PhotoBucket photo page. Then hopefully somehow being able to actually pull the image. Then I'll build a dictionary of image links to other image links so that all the PB links can be replaced.
I think I just need to get the cookie it gives me and send it back while requesting the same thing again.
I... started making a spine tester. Haven't been very productive.. the garage isn't a very pleasant place.. It's a big board, the supports for the arrow are pieces of wood glued on with TB3.
I guess I should taper one of the Sitka spruce shafts from Wayne, and put a wooden blunt head on it for short range practice. All it takes is one arrow with no fletchings to be able to practice well...
edit: This saves the proper image from a given PhotoBucket URL. BeautifulSoup isn't even being used yet. It works... next, just write some code to extract all the PhotoBucket paths from a forum thread or whatever, and repeat this process. I should probably add diagnostic error checking stuff as well. This would probably be easier with a browser macro, but meh. Slower.
from bs4 import BeautifulSoup
import requests
from io import BytesIO
from urllib.parse import urlparse
from os.path import splitext, basename
s = requests.Session()
ua = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"
url = 'http://i984.photobucket.com/albums/ae321/isaacscr/Misc/HPIM5242.jpg'
# haha i'm totally Chrome
headers = {'Upgrade-Insecure-Requests': '1',
'User-Agent': ua,
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'DNT':'1'}
s.headers.update(headers)
req1 = s.get(url)
s.headers.update({'referer': req1.url})
img_req = s.get(url.replace('http', 'https'))
img_data = BytesIO(img_req.content)
img_url_path = urlparse(img_req.url).path
img_filename = basename(img_url_path)
with open(img_filename, 'wb') as out:
out.write(img_data.read())
...
photobucket is a slow piece of trash, so it times out sometimes :\
never liked photobucket...