finding most likes on a tag on instagram

Finally, a technical post. I’m on the NYC FRC (FIRST Robotics Challenge) planning committee. Even though the competition was cancelled, the Instagram photo contest was not. The idea was that students post pictures to a specific hashtag and the ones with a lot of likes win.

On a planning call this week, someone noted that it was no longer easy to see the number of likes making it a pain to find the ones with the most likes. That sounded like something a computer would be good at so I volunteered to take care of it.

There were only 86 submissions so mousing over each by hand and keeping a list wouldn’t have been terrible. And I probably could have gotten it done faster that way than by automating it. But where’s the fun in that.

Attempt #1 (failed) – API

There is a documented API to get posts by hashtag. It requires you to have a business or creator account to use. I have neither. This page says anyone can get a creator page. I don’t see that option. Possibly because my account is new or private. And I don’t want to make it public so not going down this road.

Attempt #2 (failed) – screenscraping

I know the URL of the hashtag. And it is available without a login. Great. I can just use Selenium to scrape the data. Well, I couldn’t get this working. The page uses progressive rendering. I did try using code from StackOverflow to page down. I used the ChromeDriver so I could confirm it really was scrolling. It did. But I still didn’t get all the images available to the Selenium driver. So I had to abandon that approach.

private void scrollToBottomOfPage() {
		
  JavascriptExecutor js = (JavascriptExecutor) driver;
  try {
     long lastHeight = ((Number) js.executeScript("return document.body.scrollHeight")).longValue();
     while (true) {
        ((JavascriptExecutor) driver).executeScript("window.scrollTo(0, document.body.scrollHeight);");
        Thread.sleep(2000);
        long newHeight = ((Number) js.executeScript("return document.body.scrollHeight")).longValue();
        if (newHeight == lastHeight) {
           break;
        }
	lastHeight = newHeight;
     }
   } catch (InterruptedException e) {
      e.printStackTrace();
   }
}

Attempt #3 (failed) – logging in

When watching it in ChromeDriver, I noticed that there was a prompt about logging in. So I thought maybe that was the problem. I wrote some sloppy Selenium code to login and saw the same behavior. It did login, but I still only got a subset of images. (I would have refactored the timeout, hard coded credentials and loop if it had helped)

driver.get("https://www.instagram.com/");
System.out.println(driver.getPageSource());

List<WebElement> x = driver.findElements(By.tagName("input"));
System.out.println(x);

// TODO change timeout to a wait until
try {
   Thread.sleep(5000);
} catch (InterruptedException e) {
   // TODO Auto-generated catch block
   e.printStackTrace();
}

WebElement username = driver.findElement(By.name("username"));
WebElement password = driver.findElement(By.name("password"));

// TODO don't hard code
username.sendKeys("xxx");
password.sendKeys("xxx");

// TODO rewrite
List<WebElement> l = driver.findElements(By.tagName("button"));
System.out.println(l);
for (WebElement webElement : l) {
   System.out.println(webElement.getAttribute("type"));
   if (webElement.getAttribute("type").equals("submit")) {
     webElement.click();
   }
}

Attempt #4 (failed) – save file

At this point, I decided to stop messing with Selenium and just download the data myself. I opened the web page and scrolled to the bottom to get all the images. I then saved the page in chrome to get all the files. And… still didn’t have everything. This suggests the page is set up to not store everything and no amount of fiddling with Selenium was going to work.

Attempt #5 (failed) – network traffic

The files are all downloaded in my browser at some point. So I used Chrome’s network traffic monitor (in developer tools). Unfortunately, you can’t get the actual Instagram URL from the image link used for the CDN (content delivery network)

Attempt #6 (success kinda) – JSON

The “kinda” is because I don’t have paging working and the__a API is deprecated

Then I found this post which tells me I can use https://www.instagram.com/explore/tags/frcnyc2020/?__a=1 to get the results as JSON. Whoo hoo! This returns the data. Then it was just a matter of parsing it and creating the report.

That worked. The completed code is below and on GitHub

package com.jeanneboyarsky.instagram;

import java.util.*;
import java.util.Map.*;
import java.util.stream.*;

import org.junit.jupiter.api.*;
import org.openqa.selenium.*;
import org.openqa.selenium.htmlunit.*;

import com.fasterxml.jackson.databind.*;

public class CountLikesIT {

  private static final String TAG = "frcnyc2020";

  private WebDriver driver;

  @BeforeEach
  void setup() {
    driver = new HtmlUnitDriver();
  }

  @AfterEach
  void tearDown() {
    if (driver != null) {
      driver.close();
    }
  }

  @Test
  void graphQlJson() throws Exception {
    // https://stackoverflow.com/questions/43655098/how-to-get-all-instagram-posts-by-hashtag-with-the-api-not-only-the-posts-of-my
    // "count" shows up 258 times (this is three times per image)
    // 1) edge_media_to_comment
    // 2) edge_liked_by
    // 3) edge_media_preview_like - looks same as #2
    String json = getJson();

    ObjectMapper objectMapper = new ObjectMapper();
    JsonNode rootNode = objectMapper.readTree(json);
    List<JsonNode> nodes = rootNode.findValues("node");
		
    Map<String, Integer> result = nodes.stream()
   // node occurs at multiple levels; we only want the ones that go with posts
   .filter(this::isPost)
   .collect(Collectors.toMap(this::getUrl, this::getNumLikes, 
  // ignore duplicates by choosing either
    (k, v) -> v));
	
   printDescendingByLikes(result);
  }
	
  private String getUrl(JsonNode node) {
    JsonNode shortCodeNode = node.findValue("shortcode");
    return "https://instagram.com/p/" + shortCodeNode.asText();
  }
	
  private int getNumLikes(JsonNode node) {
    JsonNode likeNode = node.get("edge_liked_by");
    return likeNode.get("count").asInt();
  }
	
  private boolean isPost(JsonNode node) {
    return node.findValue("display_url") != null;
  }

  private String getJson() {
    driver.get("https://www.instagram.com/explore/tags/" + TAG + "/?__a=1");
    String pageSource = driver.getPageSource();
    return removeHtmlTagsSinceReturnedAsWebPage(pageSource);
}

  private String removeHtmlTagsSinceReturnedAsWebPage(String pageSource) {
    String openTag = "<";
    String closeTag = ">";
    String anyCharactersInTag = "[^>]*";
		
    String regex = openTag + anyCharactersInTag + closeTag;
    return pageSource.replaceAll(regex, "");
  }

  private void printDescendingByLikes(Map<String, Integer> result) {
    Comparator<Entry<String, Integer>> comparator = 
       Comparator.comparing((Entry<String, Integer> e) -> e.getValue())
      .reversed();
	
    result.entrySet().stream()
       .sorted(comparator)
       .map(e -> e.getValue() + "\t" + e.getKey())
       .forEach(System.out::println);
    }
}

sharing about feelings

Two weeks ago I wrote about how it was hard being at home. While I do appreciate that I am able to (mostly) stay at home, am not sick, don’t have to worry about paying the bills, etc, it is still hard.

In my last blog post, I wrote that I wasn’t able to share on a call with 22 people. I did manage to write the blog post and put it on the internet. I didn’t publicize it at all and it got 34 views.

So I’m now more comfortable, speaking about how I’m feeling on calls, right? Nope. I also wan’t able to say anything about how I was feeling on a call with 4 people on it. Ironically all of those people know how I am doing. I think there’s an aspect of being scared. If I lose control of my voice/emotions on a call, it is scarier than writing about it.

Last night, I was on a call with some programmers on FIRST Robotics Competition team 694. The call started by talking about what people have been doing and ideas for the team. It was a small call (maybe 8 people). The first few students who went, provided a balanced view of how being at home was.

I think that was a key distinction between that call and the first one at work. After a few people said positive things about working from home in this situation, I felt like something was wrong with me. On the robotics call, there was both positive and negative statements so it felt ok to not be doing well. And that’s why I’m blogging again. It’s ok. And I think it’s important to hear that others are having trouble soon. We are not alone.

Maybe my next blog post can be technical…

over a week of being at home

I’ve been working form home full time for a week and a half. And almost done with my first weekend of being at home. It’s been a tough week. And something that made it worse was that most people seemed to be ok. Or at least far better than I was.

Then yesterday (Saturday), I saw this tweet! It really helped because it was the first thing I had seen that made me not feel like an overwhelmed crybaby. And I was able to “admit” on Twitter that I cried too.

On Friday, I made a comment that I didn’t say anything on a call because I didn’t want to “admit” I wasn’t fine on a call with 22 people. I’m realizing that is contributing to the problem. If anyone else on the call wasn’t fine, I contributed to the illusion that everyone is fine. Posting the tweet was scary. So is posting this blog entry. The internet has more than 22 people on it!

What’s been on my mind the past week

  • Having trouble concentrating/focusing
  • Frustrated that I was having trouble concentrating
  • Distractions (I normally get distracted at home. When I’m only telecommuting for a day that’s not a problem because I would just get more done the next day. That approach doesn’t work here)
  • My work setup is less than ideal. It’s better than when I ad-hoc telecommute as I’ve sacrificed living space to build a work area. Some of that came from getting rid of space I need when I have friends over and the like. All stuff I can eventually unroll.
  • Fear of leading a four hour meeting when I couldn’t even focus for short periods of time.
  • Fear of how I was going to be ok without seeing my friends for a long time
  • Fear of how I was going to be ok with being home almost all the time
  • Fear that we are going to lose the ability to go get takeout and/or go for a walk (like parts of Italy)
  • Worry for my immediate family (out of state so limited ability to help)
  • Plus the normal fear of the future/disease that everyone has, friends and family, etc

Triggers

Part of the problem was that I had pretty much no emotional resiliency. So things that normally wouldn’t have set me off turned me into a puddle. And just thinking about the things I was afraid of did the same.

It also meant I didn’t have the ability to mentally process. Normally, I’d be able to think about something rationally and determine if it was a problem and/or how big before panicking. Last week was not that week. Instead, turned into a puddle.

I cried 7.5 times during work hours in the 8 days I’ve worked at home so far. The half a time was the one I was able to hold in. (I didn’t count the # of times outside of work hours.) I think that metric will be lower next week

Why I feel better now that I did last week

  • I ran a one hour Toastmasters meeting on Thursday. I was one of the more comfortable with remote meetings of the people on the call so I was both teaching and being a good role model. I also felt like I was in “leader mode” so fine. Kind of like how I am our floor fire warden and know I can stay calm in actual emergency.
  • The four hour sprint planning meeting went fine. Two of my teammates brought their cats to the meeting which definitely helped!
  • On the weekend, I have had three video chats, a long call with my best friend and some emails/texts with other friends. I also got two offers two video chat whenever I need. So I don’t feel freaked out about being alone anymore. (Not the same as actual human contact, but seems like enough to keep me grounded)
  • Two of the three video chats involved my friends kids and pets. Also helped with virtual human contact.
  • NYC doesn’t appear to be moving towards banning the ability to go for a walk. I’ve always needed to walk for stress relief. The times I’ve hurt my leg/ankle have been really stressful without that release!
  • Friday was in the 70s after work. Walking was more enjoyable. And I was able to sit on my balcony and read for an hour. I like sitting outside. And since it is private space, it is allowed. (Conveniently my building has separate balconies that are more than 6 feet apart)
  • A friend sent me flowers to make me smile. I put them on the balcony where they should last longer and preserve that message.
  • I read about putting up a post-it happy face to give the neighbors something to smile about. Mine makes a cool shadow in the afternoon which makes me smile!

The coming week

There’s still plenty of things I’m worried about. I’m sure that’s true for everyone. But I think it is now below the level where it is impeding my ability to think. My hope is for a mildly frustrating week. That would be a huge upgrade from last week!