Making sample JDBC code easier

I’m writing a lot of small JDBC examples for our upcoming practice tests book. I got tired of writing the same code over and over so I wrote this helper class. It can:

  • Drop the table if it already there (because I run the same examples many times)
  • Run a varargs of SQL statements (to create tables and insert data)
  • Spit out the contents of a table (to confirm what happened)


import java.sql.*;
public class DerbyUtil {

    public static void dropTablesIfExist(String url, String... tables) throws SQLException {
        try (var conn = DriverManager.getConnection(url)) {
            for (String table : tables) {
                try (PreparedStatement ps = conn.prepareStatement("DROP TABLE " + table)) {
                    ps.executeUpdate();
                } catch (SQLException e) {
                    // ignore exceptions on drop since table might not exist
                }
            }
        }
    }

    public static void run(String url, String... sqlStatements) throws SQLException {
        try (var conn = DriverManager.getConnection(url)) {
            for (String stmt : sqlStatements) {
                try (PreparedStatement ps = conn.prepareStatement(stmt)) {
                    ps.executeUpdate();
                }
            }
        }
    }

    public static void outputTable(String url, String tableName) throws SQLException {
        String sql = "SELECT * FROM " + tableName;
        try (var conn = DriverManager.getConnection(url);
             var ps = conn.prepareStatement(sql);
             var rs = ps.executeQuery()) {

            var metadata = rs.getMetaData();
            int numColumns = metadata.getColumnCount();

            while (rs.next()) {
                outputRow(rs, numColumns);
            }
        }
    }

    private static void outputRow(ResultSet rs, int numColumns) throws SQLException {
        for (int i = 1; i <= numColumns; i++) {
            System.out.print(rs.getString(i));
            System.out.print('\t');
        }
        System.out.println();
    }
}

JavaOne – Simplified and Fast Fraud Detection

Simplified and Fast Fraud Detection”

Speaker: Keith Laker

For more blog posts from JavaOne, see the table of contents


Live SQL

  • free online Oracle 12C database
  • Can save scripts
  • Google searchable
  • Each OTN (oracle tech network) users sees own copy of data. Sandboxed
  • Can download data as CSV

https://livesql.oracle.com/apex/livesql/file/index.html

And for this session the live sql URL

Pattern Matching

  • types – regex, sed/awk
  • in SQL – row level regex
  • new: pattern recognition in a stream or rows – aka can match across rows and columns
  • new SQL construct MATCH_RECOGNIZE – ANSII standard; not Oracle specific

Steps

  1. Bucket and order the data
    • This makes the patterns “visible”.
    • Used order by or partition by/order by so queries are deterministic (this does not require the paid Oracle partitioning feature)
  2. Define the pattern
    • Regular expression like pattern
    • Ex: PATTERN (X+ Y+ Z+) where X/Y/Z is a boolean expression. Ex: bal < PREV(bal)
    • Common qualifiers: * + ? {n} {n,} {n,m}
    • Also have extra ? for reluctant qualifiers – helps deal with what to do with overlapping matches
  3. Define measures
    • Define columns in output table
    • pattern navigation options; PREV, NEXT, FIRST, LAST
    • column
    • optional aggregates (COUNT, SUM, AVG, MAX, MIN)
    • special measures: CLASSIFIER() – which component of the pattern applied to this row and MATCH_NUMBER() – how many matches within each partition – both are good for debugging
    • Ex: MEASURES FIRST(x.tstamp) as first_x
  4. Controlling output
    • by default get a column per measure along with the partitioning column (when using one row per match). Get more columns with all rows per match)
    • how many rows back: ONE ROW PER MATCH (default) ALL ROWS PER MATCH or ALL ROWS PER MATCH WITH UNMATCHED ROWS (good for debugging)
    • where to start next search: AFTER MATCH SKIP PAST LAST ROW (default), also options for next row and relating to variables

Demo

  • Find 3 or more small (<2K) money transfers within 30 days. Then find large transfer (?=1M) within 10 days of last small transfer
  • Can do in SQL without pattern matching, but a lot of code.
  • Can do in Java, but. [copying the database…]
  • Showed how to create a table for JSON data – reads into a CLOB and Oracle checks it is valid JSON. Loaded with insert statements because live sql is web based and can’t access underlying file system.
  • Can use dot notation to access SQL fields

Sample pattern matching statement:


SELECT *
FROM transfers_view
MATCH_RECOGNIZE(
 ORDER BY time_id
 MEASURES
 user_id AS user_id,
 amount AS amount
 PATTERN (X{3,} Y)
 DEFINE
 X AS (amount < 2000) AND 
 LAST(time_id) - FIRST(time_id) < 30,
 Y AS (amount >= 1000000) AND 
 time_id - LAST(x.time_id)< 10);

My take: This was a two hour “tutorial” which differs from a hands on lab. We were still able to follow along with a laptop or “large tablet.” I followed along with the demos on my Mac. Which also let me play a bit. It was fun. I’ve always liked SQL :). I like that he uses QR codes for the links/blogs he wants people to go to. They are also linked in the PowerPoint when it becomes available.

It was also interesting blogging on my laptop. On my tablet, I blog in HTML because it is a pain to u se the visual editor on the tablet. A laptop has no such problem. But a laptop battery doesn’t last all day so…

postgres tuning – fixing a production problem

After a new feature was implemented (not by me), coderanch started crashing almost every day in the middle of the night.  In a few days, I’ll be blogging about the troubleshooting process and how timezones helped.  This post focuses on the end game – once we knew the problem was being caused by a large query – which is when I started being involved.

The tuning process

Like all good SQL tuning, I ran explain and iterated.  As background the new feature was a join table with half a million rows.

Explain cost What changed Observations
210,184 n/a No wonder the site is crashing.  For a web page (the forum list), this is forever!  While the query plan is using an index, it is using the index to join a table with half a million rows to a table with millions of rows.
40,590 Removed an unnecessary subquery.  (It was unnecessary because the column in populates isn’t used.) The problem is that the query isn’t using the index for a where clause.  Which is causing joins on very large tables to get a small amount of data.  Another problem is that the query limits the # rows returned to one page worth but does it at the end prohibiting the database from saving work.
1,807 Hack – we really want to  query the post time from the join table.  Since it wasn’t on there and it was too much work to add during the week, I introduced a hack.  I sorted by post creation (post id) and limited the query to sorting the most recent 100 records for the rest of the query. While this is much faster, it is functionally incorrect.  If an older post is still under discussion, it didn’t appear in the post list.  So broken, but fast enough to get us to the weekend.
288 Added the latest post time as a stored field on the join table. Ah.  Done

Learnings about postgres – locks

I ran a really simple statement to add a column to a table:

alter table jforum_topics_forums add column last_post_time TIMESTAMP without time zone not null default now();

Luckily I was on the test server because I had to kill it after 5 minutes.  At first, I thought the problem was setting the field to a value since it had to go through all the records.  That wasn’t the problem though.  The problem was that postgres was waiting on a lock.

SELECT * FROM pg_stat_activity;

select * from pg_locks where pid= 4503 and granted='f';

Running the above SQL, showed me postgres was waiting on an exclusive lock.  After I shut down the forum, the alter statement ran almost instantaneously.  The actual stored procedure to populate the new field (based on another table) took a few minutes.  But that makes sense as it was a stored procedure doing half a million queries.

Testing with lots of data

Everything went fine on my machine. On the test server (which does have lots of data), I realized that I forgot to add the index that uses the new last post time column.  That was the entire point of this exercise!  And it goes to show how important it is to have production volumes of test data.