| Plundering Facebook Photo AlbumsKrazyWorks

Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Featured

Plundering Facebook Photo Albums

Submitted by Igor on September 30, 2021 – 11:42 am

Let’s imagine you need to download all the photos in a Facebook photo album. It can be a public album, a friend’s, or even your own. Sure, you can do this manually, but you probably don’t want to. And so I came up with a little bit of automation.

For any of the stuff below to work, you need to have a Facebook account with access to the photo album in question.

The excellent nixCraft blog by Vivek Gite has an equally wonderful Facebook page with lots of techie memes. I like memes. I like to steal memes. And save them for the time when our civilization destroys itself, and I will become the king of memes.

The only problem was that Mr. Gite has accumulated over four thousand of them in his Timeline Photos album. Contrary to popular belief, I don’t have the kind of time it would take to right-click on every picture in that album and save it.

Making the list

So the first step is to get a list of URLs for the photos you want to download. The simplest option is to go to the album, scroll to the bottom of it by hitting the “End” key like a maniac, and then use one of the URL clipper extensions. I use Firefox (and so should you), and my favorite URL picker is the Linkgopher (also available for Chrome).

An alternative to scrolling to the bottom of the photo album yourself is to use an extension like FoxScroller. This URL-gathering step can also be entirely automated by using the process described below, but I had no time to lay with this.

The list of links to the photos in a Facebook photo album would look something like this:

https://www.facebook.com/${user_id}/photos/a.${album_id}/${photo_id}/

# Example:

https://www.facebook.com/nixcraft/photos/a.431194973560553/3411636868849667/
https://www.facebook.com/nixcraft/photos/a.431194973560553/3411657742180913/
https://www.facebook.com/nixcraft/photos/a.431194973560553/3411663615513659/

The problem with these links is that they are not the photos you want to download. These links will lead you to a JavaScript-infested page that will require authentication before it will generate a dynamic link to the actual photo. And that link will only work for you and your current login session. And it will expire soon too.

Because all of this complexity uses JS, you can’t use wget or curl. Ideally, what you need is a scriptable headless browser that supports JavaScript. PhantomJS is one of them, and that’s what I’ll use.

Getting the cookies

The first step is Facebook authentication. The basic syntax is this:

phantomjs --load-images=true --local-storage-path=/tmp \
--disk-cache=true --disk-cache-path=/tmp \
--cookies-file=${cookies} --ignore-ssl-errors=true \
--ssl-protocol=any --web-security=true \
${basedir}/scripts/facebook_login.js "${ua}" "${login_url}"

# where
login_url="https://www.facebook.com/login/"
ua="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"

The key element here is the facebook_login.js script that you can get here. You will need to edit this file to insert your Facebook login credentials.

Grabbing the photos

The next move is to loop through the list of URLs you collected earlier, open them in PhantomJS, extract the dynamically-generated image link, and download it.

Here’s the basic syntax:

phantomjs --load-images=true --local-storage-path=/tmp \
--disk-cache=true --disk-cache-path=/tmp --cookies-file=${cookies} \
--ignore-ssl-errors=true --ssl-protocol=any --web-security=true \
${basedir}/scripts/phantomjs_render.js "${ua}" "${url}" \
${basedir}/tmp/${project}.png

The important component here is the phantomjs_render.js script that you can grab here. In addition to rendering the Web page, this script will also make a screenshot of it. This is not strictly necessary in our case. I was just too lazy to edit out this feature.

Finally, we need to extract the correct image URL from the dynamically-generated Facebook HTML diarrhea. This piece here can use a bit more work, but it works for the most part.

# Get the photo file extention
ext="$(grep -oP "(?<=\.)[a-z]{3,4}(?=\?_nc_cat)" \
${basedir}/tmp/temp.html | head -1)"

# Extract the photo URL and wget it
wget -q "$(grep -oP "(?<=\"image\":\{\"uri\":\").*(?=\",\"width\")" \
${basedir}/tmp/temp.html | sed 's@\@@g')" -O \
${basedir}/data/${project}/${project}_$(shuf -i 100000-999999 -n 1).${ext}

To make things a bit more user-friendly, I put together this little script that should do all this stuff. Hopefully.