What objects/functions are available for custom data extractors?

Answered

From looking at the included custom data extractors (e.g. HTML-JavaScript.html.js), I can see that the objects/variables TRANSPOSED, COLUMNS, ROWS and FORMATTER are available.

Is there any documentation on these objects?
What methods are available on them?
Are there other objects that are available?

3
21 comments

Hi Vidar!

This API is experimental, it is not published yet, and it is subject to change.

But here is some overview on what there is:

Binding Type Description
 
DIALECT
com.intellij.database.dialects.DatabaseDialect  
Database dialect
 
TABLE
com.intellij.database.model.DasTable Database table metadata
 
ALL_COLUMNS
java.util.List<? extends com.intellij.database.extractors.DataColumn>  
A list of all columns in query result/currently open table editor.
com.intellij.database.extractors.DataColumn exposes
 
int columnNumber();
String name();
 
COLUMNS
java.util.List<? extends com.intellij.database.extractors.DataColumn>  
A list of selected columns in query result/currently open table editor.
 
FORMATTER
com.intellij.database.extractors.ValueFormatter

Exposes default formatting methods:
java.lang.String format(com.intellij.database.extractors.DataRow, com.intellij.database.extractors.DataColumn);
java.lang.String format(java.lang.Object, com.intellij.database.extractors.DataColumn);

OUT java.lang.Appendable Extractor output
ROWS java.lang.Iterable<com.intellij.database.extractors.DataRow>

Row data stream.
com.intellij.database.extractors.DataRow exposes:

 
int rowNumber();
boolean first
();
boolean last
();
java.util.List<java.lang.Object> data();
Object value(com.intellij.database.extractors.DataColumn);
TRANSPOSED java.lang.Boolean Indicates whether the table editor is in transposed mode.


I hope, it helps. Feel free to ask questions and share your ideas on how to improve this feature.

3

In the 2020.3 release, the documentation in the extractor header was updated to add getTypeName and isStringLiteral.

FORMATTER { format(row, col); formatValue(Object, col); getTypeName(Object, col); isStringLiteral(Object, col); }

getTypeName(object, col) will return the datatype of a column as reported from the db.  Although row and column are required here from what I can tell this is the datatype of the column not of the row.  If you're using SQLite null column with multiple datatypes the info from the first row will be returned.

I use Oracle 12c, SQL Server, and SQLite.  Below are datatypes I extracted from my db.

getTypeName: 
SQL SERVER strings[binary, char, image, nchar, ntext, nvarchar, text, varbinary, varchar]
numbers[bigint, decimal, float, int, int identity, money, numeric, real, smallint, smallmoney, tinyint]
date [datetime, smalldatetime, timestamp]
other [bit, sql_variant, uniqueidentifer]
date datatypes not supported (always returned as nvarchar) [datetime2, date, time, datetimeoffset]

oracle strings[blob, char, clob, long raw, nchar, nclob, nvarchar2, raw, rowid, varchar2]
numbers[binary_double, binary_float, number]
date [date, intervalds, intervalym, timestamp, timestamp with local time zone, timestamp with time zone]
other [bfile, sys.xmltype]

sqlite strings[blob, char, character, clob, native character, nchar, nvarchar, string, text, varchar, varying character]
numbers[bigint, decimal, double, double precision, float, int, int2, int8, integer, mediumint, numeric, real, smallint, tinyint, unsigned big integer]
date [date, datetime, time]
other [boolean, none]
dynamic datatypes (unknown) not supported - always returned as datatype of first record instead of result of typeof(<field>)

 

Here is an example of usage from my MERGE extractor:

    ALL_COLUMNS.eachWithIndex { column, idx ->
def value = dataRow.value(column)
def isStringLiteral = value != null && FORMATTER.isStringLiteral(value, column)
def stringValue = FORMATTER.format(dataRow, column)
def TypeName = FORMATTER.getTypeName(dataRow, column)
bIsOracle = DIALECT.getDbms().isOracle()
if (bIsOracle && value != null && TypeName.toUpperCase() in ["DATE", "TIMESTAMP", "TIMESTAMP WITH LOCAL TIME ZONE", "TIMESTAMP WITH TIME ZONE"] ) {
OUT.append(INDENT)
.append("TO_DATE('" + stringValue + "', 'YYYY-MM-DD HH24:MI:SS')")
.append(idx != ALL_COLUMNS.size() - 1 ? " AS " + arrColName[idx] + SEP + NEWLINE : " AS " + arrColName[idx] + NEWLINE)
} else {
OUT.append(INDENT)
.append(isStringLiteral ? (STRING_PREFIX + QUOTE) : "")
.append(stringValue ? stringValue.replace(QUOTE, QUOTE + QUOTE) : stringValue)
.append(isStringLiteral ? QUOTE : "")
.append(idx != ALL_COLUMNS.size() - 1 ? " AS " + arrColName[idx] + SEP + NEWLINE : " AS " + arrColName[idx] + NEWLINE)
}
}

isStringLiteral(object, col) will return a boolean indicating the field is a string datatype.  This came about due to false-positive when trying to identify numeric fields.  Before we had to convert to a string then check to see if that string was a number resulting in account number fields being converted.

Old Way:
value.toString().isNumber()

New Way:
def isStringLiteral = value != null && FORMATTER.isStringLiteral(value, column)

 

2

Good news!

Starting with the 1.0 Public Preview build, we're shipping sources for open API parts of the product.

They can be found in %IDE_HOME%/lib/src/src_database-openapi.zip.

Happy hacking!

1

The default converters are not suitable for large files, please create avro/parquet/orc file exporters too

1

Great, thanks!

Is the source code or API of those com.intellij.database.* classes (like com.intellij.database.model.DasTable) available somewhere?

0

Not yet, but they will become a part of public API, too.

Until then, if you need help figuring out something - just ask, and we'll try to help you.

BTW, what kind of extractor are you trying to build?

0

Currently I'm building an extractor that will provide data for our DB test runner, on the form

<TABLENAME COLUMN1="column1value" COLUMN2="column2value"/>

- which I just now succeeded in doing, thanks to TABLE.name. Thanks!

Previously I wrote an extractor that outputs the resultset in JIRA markup table format:

|| COLUMN1 || COLUMN2 ||
|  R1C1 | R1C2 |
|  R2C1 | R2C2|


I'll post the code for the JIRA extractor here, in case it can help anyone:

var NEWLINE = "\n";

function output() {
  for (var i = 0; i < arguments.length; i++) {
    OUT.append(arguments[i]);
  }
}

var headerIterator = COLUMNS.iterator();
while (headerIterator.hasNext()) {
  var col = headerIterator.next();
  output("||");
  output(col.name());
}
output("||");

var rowIterator = ROWS.iterator();
while (rowIterator.hasNext()) {
  var row = rowIterator.next();
  output(NEWLINE);
  var columnIterator = COLUMNS.iterator();
  while (columnIterator.hasNext()) {
    var column = columnIterator.next();
    output("|");
    output(FORMATTER.format(row, column));
  }
  output("|");
}


Date formatting will probably be an issue soon.

Where does FORMATTER get its date format from, when formatting an Oracle DATE value, for example?

0

For now, it doesn't do anything clever - it just uses "yyyy-MM-dd HH:mm:ss" for timestamps, "yyyy-MM-dd" for dates, and "HH:mm:ss" for times.

So you may want to use the instanceof operator to dispatch on value types and format them the way you want.

0

If anyone is interested: I've written a MySQL like output format the other day. For small datasets it makes it easy to communicate the values using fixed width font:

+-------+-----------------+---------------------+
| sapID | fulfilment_type | start_time          |
+-------+-----------------+---------------------+
| 3595  | COLLECTION      | 1970-01-05 08:00:00 |
| 3595  | COLLECTION      | 1970-01-05 10:00:00 |
| 3595  | COLLECTION      | 1970-01-05 12:00:00 |
| 3595  | COLLECTION      | 1970-01-05 14:00:00 |
| 3595  | COLLECTION      | 1970-01-05 16:00:00 |
+-------+-----------------+---------------------+
5 rows in set.

You can find it here https://github.com/aukevanleeuwen/intellij-extractors 

0

@Sergey Savenko thanks, but how to import the package?

i use

import com.intellij.database.model.DasTable

but it

0

Unfortunately, %IDE_HOME%/lib/src/src_database-openapi.zip was included in datagrip-2016.1.exe, but is missing in datagrip-2016.3.2.exe and later versions.

And com.intellij.database java sources aren't still published in any googlable place.

0

When highlighting a sql select statement and using the execute to file/SQL Inserts method, the TABLE object supplied to the extractors is null and all insert statements are using "MY_TABLE" as the table name. 

INSERT INTO "MY_TABLE"( ...
INSERT INTO "MY_TABLE"( ...
INSERT INTO "MY_TABLE"( ...
...

 

However, when executing the same statement without export, then exporting from the results pane, the table object gets supplied to the sql inserts script and the insert statements are using the correct table name.  

insert into report.product( ...
insert into report.product( ...
insert into report.product( ...
...

 

Is there a way to get the table name for an extractor when executing to file?

I also notice that when using the same export/extractor method (SQL Inserts), the sql statement case when using the method from the results pane (lower) is different from when applied directly to a sql statement.   I feel like this is a clue, but I don't know where to go from here.  Thanks!

 

 

0

Howdy, is there a way of dumping multiple tabs or the output of the entire script file to the same out file?

0

I am writing a Data Extractor to  automatically create a DataFixture for PHP based on a query, it is mostly finished, but I am having some issues with timestamps and want to know the type of the column (to detect timestamp), can you give me a clue please?

I have tried something like this:

def t = COLUMNS[idx].type().contains("timestamp")

 

0

I am trying to write an Oracle SQL multireow insert extractor - almost finished, but the last detail is the date columns. Right now I have to mannualy (outside Datagrip) convert "2020-12-16 10:21:11.876990" to "TO_DATE("2020-12-16 10:21", "YYYY-MM-DD HH24:MI:SS")".

I can make a regular expression test to check if it´s a date, but it would be much cleaner if I had the type of the column, that I believe is in the com.intellij.database.extractors.DataColumnImpl, but I can´t find any documentation on the class.

Anyone knows where I can find these docs ?

 

 

0

For ORACLE, and improving slightly on Jim Gust's contribution (thanks Jim!), I did this within a copy of SQL-Insert-Statements.sql.groovy in order to capture a different TO_DATE mask for DATE fields that have only the date and no time.

def TypeName = FORMATTER.getTypeName(dataRow, column)
bIsOracle = DIALECT.getDbms().isOracle()
if (bIsOracle && value != null && TypeName.toUpperCase() in ["DATE", "TIMESTAMP", "TIMESTAMP WITH LOCAL TIME ZONE", "TIMESTAMP WITH TIME ZONE"] ) {
if (stringValue.length() == 10) {
OUT.append("TO_DATE('" + stringValue + "', 'YYYY-MM-DD')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else {
OUT.append("TO_DATE('" + stringValue + "', 'YYYY-MM-DD HH24:MI:SS')")
.append(idx != columns.size() - 1 ? SEP : "")
}
} else {
OUT.append(isStringLiteral ? (STRING_PREFIX + QUOTE) : "")
.append(isStringLiteral ? stringValue.replace(QUOTE, QUOTE + QUOTE) : stringValue)
.append(isStringLiteral ? QUOTE : "")
.append(idx != columns.size() - 1 ? SEP : "")
}


0

Hi Folks,

Is there any update on the location of the documentation for this? It seems strange that this community post is the best source of information on the API.

0

Eric,

Thank you for the update.  Based on what you did I expanded the Oracle date logic to cover all of the date data types.  Below is my SQL-Insert-Statements.sql.groovy

/*
* Available context bindings:
* COLUMNS List<DataColumn>
* ROWS Iterable<DataRow>
* OUT { append() }
* FORMATTER { format(row, col); formatValue(Object, col); getTypeName(Object, col); isStringLiteral(Object, col); }
* TRANSPOSED Boolean
* plus ALL_COLUMNS, TABLE, DIALECT
*
* where:
* DataRow { rowNumber(); first(); last(); data(): List<Object>; value(column): Object }
* DataColumn { columnNumber(), name() }
*
getTypeName:
+------------+--------------------------------------------------------------------------------------------------------------------------------------
| SQL SERVER | strings[binary, char, image, nchar, ntext, nvarchar, text, varbinary, varchar]
| | numbers[bigint, decimal, float, int, int identity, money, numeric, real, smallint, smallmoney, tinyint]
| | date [datetime, smalldatetime, timestamp]
| | other [bit, sql_variant, uniqueidentifer]
| | date datatypes not supported (always returned as nvarchar) [datetime2, date, time, datetimeoffset]
+------------+--------------------------------------------------------------------------------------------------------------------------------------
| oracle | strings[blob, char, clob, long raw, nchar, nclob, nvarchar2, raw, rowid, varchar2]
| | numbers[binary_double, binary_float, number]
| | date [date, intervalds, intervalym, timestamp, timestamp with local time zone, timestamp with time zone]
| | other [bfile, sys.xmltype]
+------------+--------------------------------------------------------------------------------------------------------------------------------------
| sqlite | strings[blob, char, character, clob, native character, nchar, nvarchar, string, text, varchar, varying character]
| | numbers[bigint, decimal, double, double precision, float, int, int2, int8, integer, mediumint, numeric, real, smallint, tinyint, unsigned big integer]
| | date [date, datetime, time]
| | other [boolean, none]
| | dynamic datatypes (unknown) not supported - always returned as datatype of first record instead of result of typeof(<field>)
+------------+--------------------------------------------------------------------------------------------------------------------------------------
*/

SEP = ", "
QUOTE = "\'"
STRING_PREFIX = DIALECT.getDbms().isMicrosoft() ? "N" : ""
bIsOracle = DIALECT.getDbms().isOracle()
NEWLINE = System.getProperty("line.separator")

KEYWORDS_LOWERCASE = com.intellij.database.util.DbSqlUtil.areKeywordsLowerCase(PROJECT)
KW_INSERT_INTO = KEYWORDS_LOWERCASE ? "insert into " : "INSERT INTO "
KW_VALUES = KEYWORDS_LOWERCASE ? ") values (" : ") VALUES ("
KW_NULL = KEYWORDS_LOWERCASE ? "null" : "NULL"

def record(columns, dataRow) {
OUT.append(KW_INSERT_INTO)
if (TABLE == null) OUT.append("MY_TABLE")
else OUT.append(TABLE.getParent().getName()).append(".").append(TABLE.getName())
OUT.append(" (")

columns.eachWithIndex { column, idx ->
OUT.append(column.name()).append(idx != columns.size() - 1 ? SEP : "")
}

OUT.append(KW_VALUES)
columns.eachWithIndex { column, idx ->
def value = dataRow.value(column)
def TypeName = FORMATTER.getTypeName(dataRow, column).toUpperCase()
def stringValue = value == null ? KW_NULL : FORMATTER.formatValue(value, column)
def isStringLiteral = value != null && FORMATTER.isStringLiteral(value, column)
if (isStringLiteral && DIALECT.getDbms().isMysql()) stringValue = stringValue.replace("\\", "\\\\")

// replace standard value with a check for Oracle date to add the proper function
if (bIsOracle && value != null && TypeName.toUpperCase() in ["DATE", "TIMESTAMP", "TIMESTAMP WITH LOCAL TIME ZONE", "TIMESTAMP WITH TIME ZONE", "INTERVALYM", "INTERVALDS"] ) {
if (TypeName.toUpperCase() == "TIMESTAMP" ) {
OUT.append("TO_TIMESTAMP('" + stringValue + "', 'YYYY-MM-DD HH24:MI:SS.FF6')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else if (TypeName.toUpperCase() == "TIMESTAMP WITH LOCAL TIME ZONE" ) {
OUT.append("TO_TIMESTAMP_TZ('" + stringValue + "', 'YYYY-MM-DD HH24:MI:SS.FF6')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else if (TypeName.toUpperCase() == "TIMESTAMP WITH TIME ZONE" ) {
OUT.append("TO_TIMESTAMP_TZ('" + stringValue + "', 'YYYY-MM-DD HH24:MI:SS.FF6 TZH:TZM')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else if (TypeName.toUpperCase() == "INTERVALYM" ) {
OUT.append("TO_YMINTERVAL('" + stringValue + "')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else if (TypeName.toUpperCase() == "INTERVALDS" ) {
OUT.append("TO_DSINTERVAL('" + stringValue + "')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else if (stringValue.length() == 10) {
OUT.append("TO_DATE('" + stringValue + "', 'YYYY-MM-DD')")
.append(idx != columns.size() - 1 ? SEP : "")
}
else {
OUT.append("TO_DATE('" + stringValue + "', 'YYYY-MM-DD HH24:MI:SS')")
.append(idx != columns.size() - 1 ? SEP : "")
}
} else {
OUT.append(isStringLiteral ? (STRING_PREFIX + QUOTE) : "")
.append(isStringLiteral ? stringValue.replace(QUOTE, QUOTE + QUOTE) : stringValue)
.append(isStringLiteral ? QUOTE : "")
.append(idx != columns.size() - 1 ? SEP : "")
}
}
OUT.append(");").append(NEWLINE)
}

ROWS.each { row -> record(COLUMNS, row) }

 

Test table and insert statement generated from SQL-Insert-Statements.sql.groovy

CREATE TABLE TEST
(
ID NUMBER,
MYDATE DATE,
MYTIMESTAMP TIMESTAMP(6),
MYTIMESTAMPLTZ TIMESTAMP(6) WITH LOCAL TIME ZONE,
MYTIMESTAMPTZ TIMESTAMP(6) WITH TIME ZONE,
MYINTERVALYTM INTERVAL YEAR(2) TO MONTH,
MYINTERVALDTS INTERVAL DAY(2) TO SECOND(6)
);

INSERT INTO JGUST.TEST (ID, MYDATE, MYTIMESTAMP, MYTIMESTAMPLTZ, MYTIMESTAMPTZ, MYINTERVALYTM, MYINTERVALDTS) VALUES (2, TO_DATE('2022-01-24 12:37:41', 'YYYY-MM-DD HH24:MI:SS'), TO_TIMESTAMP('2022-01-24 12:37:41.000000', 'YYYY-MM-DD HH24:MI:SS.FF6'), TO_TIMESTAMP_TZ('2022-01-24 18:37:41.000000', 'YYYY-MM-DD HH24:MI:SS.FF6'), TO_TIMESTAMP_TZ('2022-01-24 12:37:41.000000 -06:00', 'YYYY-MM-DD HH24:MI:SS.FF6 TZH:TZM'), TO_YMINTERVAL('10-2'), TO_DSINTERVAL('11 10:9:8.555'));
commit;

 

0

Please sign in to leave a comment.