csrf – jforum cleanup and problems

See part 1 for how we got here  and part 2 for how we changed the OWASP filter.

Code cleanup and problems

There is some poorly written code in JForum that CSRF now prevents from working.  In these cases, I needed to clean up our code.  For example:

  1. Links/anchors shouldn’t be used to update state.  They should only be used for gets.  The OWASP filter is realistic enough to recognize links are in fact used for updates in the real world and accomodates them.  It doesn’t accommodate code like
    onclick="document.location = '${contextPath}/${jforumMainServletMapping}/${moduleName}/...';" />

    In the cases this was used for actions that required CSRF protection, I needed to refactor to use a form.  In a few cases, I created a one to one form.  On others where there were a lot of actions or it wasn’t conducive, I used a common form

    <!-- CSRF token messes up submission if forum id isn't in URL -->
    
    function submitActionForForum(actionVerb, forumId) {
    
    var action = document.actionForm.action;
    
    action = action.replace("ACTION", actionVerb);
    
    action = action.replace("FORUM_ID", forumId);
    
    document.actionForm.action = action;
    
    document.actionForm.submit();
    
    }
    
    </script>
    
    <#-- general purpose form for the submit buttons on the screen; see JS for ACTION and FORUM_ID values -->
    
    <form method="post" name="actionForm" action="${contextPath}/${jforumMainServletMapping}/${moduleName}/ACTION/FORUM_ID">
    
    </form>
    
    ...
    
    onClick="submitActionForForum('up', ${forum.id})"
    
  2. Some pages (like the first one I happened to test) didn’t have </head> so dynamically adding the JavaScript which sets the CSRF token wasn’t working.  I wrote a unit test to identify such files and ensure we don’t create any more.  We also added the missing HTML headers.  Most of them were on the admin pages.
JForum changes
JForum has some features that don’t play well with OWASP CSRF Guard
  1. JForum’s controller framework is heavily dependent on the number of query parameters . Adding a CSRF token broke all sorts of things.  I solved this by adding a check that the parameter wasn’t the CSRF token to the loops in WebRequestContext’s constructor, parseFriendlyURL()  and handleMultipart().  The one in handleMultiPart is unnecessary and just there for consistency.  I also have the constructor calling a method to see if the query string is empty instead of a one liner:
    private boolean isQueryStringEmpty(HttpServletRequest superRequest) {
    String queryString = superRequest.getQueryString();
    
    if (queryString == null || queryString.length() == 0) {
    
    return true;
    
    }
    
    // ignore OWASP token (so if only OWASP present, ignore it)
    
    return queryString.matches("^&?" + CsrfFilter.OWASP_CSRF_TOKEN_NAME + "=[-0-9a-zA-Z]+$");
    
    }
  2. I then learned that multipart reads the input stream and can’t be called twice.  One approach would be to copy the stream.  I didn’t go that route.  The only multipart requests are posts and PMs.  Both of which need CSRF protection.  Instead I have the filter set to just assume multi part requests need CSRF protection.
  3. Added <noscript> on post_form.htm to explain that JavaScript is now required to post.
  4. A couple places were missing <form> around input fields.
  5. In one place, a jQuery form needed the token set explicitly since it was choosing the parameters for jQuery to send on.  (I think this was only in our JForum fork.)
  6. The CSRF filter requires AJAX to pass the token has a header not a parameter.  To add it to a jquery.ajax block:
    headers: {'OWASP_CSRFTOKEN': '${OWASP_CSRFTOKEN!""}'},
    

    This syntax uses Freemarker’s default value to use a space instead of blowing up if there is no token.

Note to users
I was worried about users seeing the CSRF screen.  Some users won’t know what CSRF is.  And most wouldn’t know what to do.  Feel free to read our error page which includes things to try and who to email.
Enhancements/Longer term goals
We started these but it will take a while
  1. Don’t use token for anonymous users
  2. Get rid of CSRF Token from the  URLs (use forms consistently for posts)
  3. Rollback any hacks we made to fix things/cleanup code.
  4. Reconsider using JavaScript to add the tokens.

Note: all code is sanitized to remove references to javaranch/coderanch.

 

The story continues in part 4 – deciding not to use JavaScript for the solution.

csrf – extending the owasp solution and “interesting” IE javascript bugs (part 2)

While implementing CSRF for JForum, I needed to extend the OWASP solution.  Let me tell you, they don’t make it easy to extend.  Lots of final.  Here’s what I did – linked to code on github.

To read about the original problem or why I choose the OWASP filter, see part 1.

Extending the OWASP solution

  1. CsrfFilter.java – The OWASP filter is final so had to copy the logic in my own.  I added logic to get the action method name/
  2. CsrfHttpServletRequestWrapper.java – Since I’m using actions instead of URLs, I need to make the request look as if it the actions are URLs.  A short simple class.
  3. CsrfListener.java – OWASP assumes you are using URLs and you can enumerate them in the property file.  I have action names and there a lot of pages to allow as unprotected – without wildcard patterns to help, this is unwieldy.  The OWASP listener isn’t final but the logic I needed to extend was in the middle of a method.  So I copied their listener adding a method that reads the csrf.properties and creates “org.owasp.csrfguard.unprotected.” properties for all lines that weren’t set to “AddToken.”
  4. Redirect.java – the OWASP Redirect class works just fine if you always have the same context root.  We have a different one when testing locally vs on the server.  And since the path is in a property file, it isn’t easy to change.  Of course the OWASP class was final so I had to copy logic rather than extending it.  My Redirect class gets the server and context root from the request and adds it to the relative path in the OWAP property file.
  5. AddJavaScriptServletFilter.java – Adds the OWASP JavaScript to each page after </head> unless it is a download or a user isn’t logged in.  Since logged in users can’t do anything CSRF exploitable, they don’t need the token.  This also makes Google happier because googlebot doesn’t see the extra parameter confusing the URLs.
  6. AjaxHelper.java – The OWASP filter adds “OWASP CSRFGuard Project” to the X-Requested-With header for AJAX calls.  You can customize it to not include that, but I wasn’t sure what would happen if I left it blank.  It was easy enough to change JForum to use startsWith instead of equals() to make it more flexible.
  7. Owasp.CsrfGuard.js (attempt 1)
    1.  The JavaScript provided by OWASP has a bug.  It goes through the properties in document.all from top to bottom.  As it finds forms, it adds hidden input fields.  So far so good.  The problem is that when you have a lot of forums, some of them get bumped past the original end of the document.  So the loop never sees them and the CSRF token is never added.  The solution is to go through the document from bottom to top.  I’ve submitted a fix to OWASP for this.  What did the trick for understanding the problem was adding console.log(“length: ” +  document.all.length);  at the beginning and end.  It grows by roughly 100.   The bug fix was changing in injectTokens()
      for(var i=len -1; i>=0; i--) {

      to

      // BUG FIX: if go through in ascending order, the indexes change as you add hidden elements
      
      // when have 100 forms, this moves some of the forms past the original document.all size
      
      for(var i=len -1; i>=0; i--) {
    2. IE doesn’t like
      var action = form.getAttribute("action");

      when you have a form field named action.  (If at all possible, DON’T DO THAT!!!).  Since JForum has 100’s of such reference, changing it at this point isn’t an option.  IE helpfully returns some object.  It’s not an array/map or list of any kind that I can tell.  After getting hopelessly stuck on this, I asked Bear Bibeault for help.  He noted the answer was in “Secrets of the JavaScript Ninja” a book he wrote that I had recently read.  And it was.  Solution

      // hack to test if action is a string since IE returns [object] when action in form and as hidden field
      
      // if not a string, assume it is our action and add token for now
      
      var action = form.getAttributeNode("action").nodeValue;
    3. The OWASP code contains this innocent looking code.
      var hidden = document.createElement("input");
      
      hidden.setAttribute("type", "hidden");
      
      hidden.setAttribute("name", tokenName);
      
      hidden.setAttribute("value", (pageTokens[uri] != null ? pageTokens[uri] : tokenValue));

      . It works fine in all browsers except IE. Want to know what IE does? That’s right.  It generates a hidden field with the correct token value and *no* token name.  I found this problem solution online and changed the code to

      
      var hidden;
      
      try {
      
      hidden = document.createElement('<input type="hidden" name="' + tokenName + '" />');
      
      } catch(e) {
      
      hidden = document.createElement("input");
      
      hidden.type = "hidden";
      
      hidden.name = tokenName;
      
      }
      
      hidden.value = (pageTokens[uri] != null ? pageTokens[uri] : tokenValue);
      
      
    4. My last IE change was an easy one.  Easy in that it was a problem I had seen before.
      var location = element.getAttribute(attr);

      changed into

      //var location = element.getAttribute(attr);
      
      // hack - getting same error as on action - don't know why but hack to move forward
      
      var attr = element.getAttributeNode(attr);
      
      var location = null;
      
      if ( attr != null) {
      
      location = attr.nodeValue;
      
      }
      
    5. Then I encountered an issue with the event handling framework in IE>  I didn’t even look at this.  All IE debugging was done between 9pm and midnight January 28th-30th after we all thought CSRF was fine from being in production the previous Sunday.  The only computer I had access to that ran IE is from 2002 and barely runs.  I was tired of IE and didn’t want to throw any more time down that rabbit hole.
  8. Owasp.CsrfGuard.js (attempt 2) – After encountering numerous IE problems, I would up merging the code from OWASP CSRF Guard version 2 and 3 to make it less dependent on the browser.

Finally, we get to look at the JForum specific parts in part 3.

production problems across time zones

A couple days ago, I blogged about the technical details of a production problem (not caused by me) at coderanch.  Now that the problem is resolved, is an interesting time to reflect on how time zones helped us.

Peak volume at the ranch

While we have users from 219 countries, roughly half our volume is from the US and India combined.  (source google analytics)  I also learned that our “peak time” is midnight to 6am Mountain Standard Time followed by 6am to 3pm.  This would be business hours in Asia and Europe followed by Europe and North America.  Peak time is misleading because bots count as users for hits.

As an added bonus, peak time for search engines/bots is 5am to 7am Mountain Standard Time.  Yes, these overlap.

When the problem occurred

Lucky for us, we have a moderator in India (Jaikiran Pai) who was able to investigate the problem real time.  Which mean those of us in the United States woke up to an almost daily email saying that site went down and an attempted fix.

Fixes for other problems

It turned out there were a couple resource leaks in the code that Jaikiran found/fixed.  One had been in the code for over a year.  One was new (due to an API being converted to JPA and the caller not adapting the open session filter.)  One was a less than desirable transaction setting.  All of these manifested because of the new, bigger problem – but were not the cause.  This is a common problem in software – finding the RIGHT problem.

Converging on the right problem

Another advantage of having someone who could look at the problem real time was that he was able to capture the database logs real time.  Right before going to sleep, Jaikiran found two queries taking a long time to run.  And by a long time, I mean one was taking OVER A MINUTE under load.  Which he found by running:

select current_query,now() - pg_stat_activity.query_start as duration from pg_stat_activity order by duration desc

He posted the two queries.  One took 200K explain plan units.  At this point, we had something that could be fixed without witnessing the problem firsthand and sql tuning work moved back to the United States. One thing the *right* solution had that the others didn’t was that it explained everything.  All the other fixes made sense, but relied on a “magic” step to get from the problem to the solution.

Tuning the hack

I created a hack that would limit the # threads shown in a forum to get us through another day or two until the weekend.  It required tuning during the production problem time.  Back to India.

Conclusion

Communication across time zones only worked because of email.  (Normally, we’d have used the forums.  But the forums weren’t a very reliable place given that the problem was the forums going down.)  I’ve never been on a team at work more than 3 time zones away.  It was a great experience working with a strong developer half the world away.  And while we’ve been developing features together, it is what you do in times of difficulty that shows your process.  It was wonderful to see ours working.

And finally: GREAT JOB JAIKIRAN!!!