When working with a large database, some
SELECT queries can take really long.
And “really long” starts at a minute or two, of course.
When designing queries, one often limits the result set because one is not yet actually interested in the final data.
However, you’ll want to have the result eventually. In the meantime, you probably want to play around further with your queries. But you don’t want to keep your current database connection (say, the GUI used for query design) busy. I happen to be in this situation quite often and have written a little script which allows me to run the queries in the background.
I had the following requirements
- Flexible parametrization
- Measure execution time
- Keep results
- Allow to reproduce the data easily
The script works as allows: One has to provide a file containing the query to be executed. The input file gets copied (for traceability/reproducibility) and executed. Execution time gets logged and one can provide further arguments like hostname or user/password.
After a couple of iterations, below is what the final script looks like. It is tied to MySQL but that doesn’t matter and can be adapted to your needs, of course.
Let me explain the different steps in more detail. The only mandatory arguments are the input query file and the output directory. The output directory will contain a copy of the input file, the log file and the results.
The output files get timestamp’ed in order to differentiate between several invocations that use the same (possibly modified) input file.
Before running the query, the input parameters are printed to the screen for visual inspection.
Then, all further output is redirected to the log file and the input file gets copied.
Finally, the query is invoked.
time invocations allow to retrospectively study the timings.
Typically, the script is then executed like so
The input get gets copied oth the output
.tsv will contain the tab-separated result data.
If any errors occurred, they can be found in the
In order to see them immediately, I usually run
I use the script regularly to execute a long-running query in a new tmux pane. Sometimes, I even forget about the query. Next day at work, I happily find all the results I need!