Slow MySQL CSV Import
I'm using the "Import Data from File..." functionality for the first time to import some small CSVs into several MySQL tables. The CSVs I'm importing are very small (2 columns, 2k rows, ~60kb in size), but they are each taking upwards of 2 minutes. This is significantly slower than what I'm used to in other SQL clients/when running the load as a "LOAD DATA LOCAL INFILE" via the command line.
Is there any way to make the data import procedure faster? How can I see exactly what command DataGrip is using to import the data into MySQL?
Thanks!
Please sign in to leave a comment.
What about server ping? Seems that just queries are taking too long. The import is the only thing that has problems with performance?
Meet the same issue. Importing a CSV file become very slow when upgraded to the latest version of the datagrip. I did not have such issue using version 2017. I was wondering what's the problem here.
You can observe SQL logs: Help - Show SQL log.
You can also send it to datagrip@jetbrains.com so we also can find suspicious statements :)
I've experience this connecting to postgres and redshift. It's easy to reproduce in postgres with a 16M file.
1. Download and unzip the movielens 20m dataset at https://grouplens.org/datasets/movielens/20m/
2. Create the schema in postgres
3. Compare postgres' \copy with Datagrip's import. The postgres command runs in a few seconds whereas Datagrip takes a couple minutes
DataGrip 2018.3.2
Build #DB-183.5429.24, built on January 22, 2019
Licensed to Josh Curtis
Subscription is active until August 1, 2019
JRE: 1.8.0_152-release-1343-b26 x86_64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
macOS 10.12.6
Got logs!
Please tell us the database version and ping time to the server
PostgresSQL 9.6 and 50ms
Thanks! We also need it from the topic starter.
Josh, can you send your file with errors? The cause may be there.
Hey there! Sorry for the long response time, I was able to reproduce the error today. How do I send you my .log file? And what other information do you need?
Please send'em to datagrip@jetbrains.com
Sent!
Server ping isn't very straightforward to measure since I'm connecting via an SSH tunnel. I am only experiencing performance issues with the import.
I have another MySQL client on my mac (Sequel Pro) that will import a 10,000 row file with two columns (IDs and email addresses) in about 30 seconds, whereas DataGrip takes almost 3 minutes. I did do some testing on a table without any keys specified and there was no performance discrepancy. It seems the key is the issue. Is it possible to see the exact command that DataGrip is using to load data so that I can cross reference against what my other SQL client is doing?
Again, this is all I see in the DataGrip logs:
-- [ --------------------------------------------------
-- 2019-02-12 16:00:48 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908713]
use readitla_analytics
-- ] --------------------------------------------------
-- 2019-02-12 16:00:48 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908713] finished - execution time: 171 ms
-- [ --------------------------------------------------
-- 2019-02-12 16:00:48 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908714]
SELECT t.* FROM import_perf_test_tags_datagrip t
LIMIT 1
-- ] --------------------------------------------------
-- 2019-02-12 16:00:49 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908714] results received
-- 2019-02-12 16:00:49 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908714] finished - execution time: 87 ms, fetching time: 106 ms, total result sets count: 1
-- [ --------------------------------------------------
-- 2019-02-12 16:00:49 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908715]
INSERT INTO import_perf_test_tags_datagrip (user_id, email) VALUES (?, ?)
-- ] --------------------------------------------------
-- 2019-02-12 16:03:39 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908715] update count received - 10000
-- 2019-02-12 16:03:39 [default] [Cardinal] [data import] [session id: 2115513827] [statement id: 2098908715] finished - execution time: 2 m 50 s 437 ms, fetching time: 1 ms, total update count: 10000
Yes, it's possible. You can see the query in the SQL log above
INSERT INTO import_perf_test_tags_datagrip (user_id, email) VALUES (?, ?).
DataGrip repeats it many times but logs only once. How much time does it take to import the data without the key? Does Sequel Pro import the data in 30 seconds with the key?
I tried to import the data without the key, but it still took about 2m40s via DataGrip. The issue must not be in the key. The reason why I suspected it was the key is that I tried to use the same tags.csv dataset that the other commenter used, but for that table (and for the same DDL that the commenter used) there was no speed difference between DataGrip and Sequel Pro. I looked at the DDL for what I had been testing with vs. the tags table and assumed it must be the key slowing things down.
Sequel Pro does import the data in 30 seconds, even with the key. I have reached out to their support to try and understand what command Sequel Pro is using to load data.
Is there anything with my table DDL that would slow down an import?
There isn't as I see. Were there any errors (you need to turn on "Write errors to file" checkbox in the import dialog)? Errors may significantly slow down the import.
There were no errors. I enabled that option, but an error file was not present after the import, so I take it that there were no errors.
I was able to confirm that Sequel Pro uses the same INSERT command:
INSERT INTO `import_perf_test_tags_sequelpro` (`user_id`,`email`) VALUES (?, ?)
I am still very confused as to why it takes so much longer to import a CSV in Datagrip. Could it be that the two SQL client batch the number of records inserted with each INSERT statement differently? I have also noticed that the schema I am creating a new table in has many, many tables. Is there some sort of introspection that is triggered once Datagrip imports data into a table? Perhaps that could be taking a while?
Do you still experience the issue? If yes, could you please attach the data file and specify a version of the database?
I am on MySQL 5.6.36. The data I'm testing with is sensitive, so I had to mockup a CSV of user IDs and emails for you. It only has 1,000 rows, but should still reproduce the issue. The DDL I am testing with is available earlier in this thread.
Mock data here (link will expire after 24 hours):
https://send.firefox.com/download/932e4e4af7/#0ciIOzvwvRON-tpZsv4s-A
Kirill, thank you for the information and sorry for the delay. It's hard to reproduce, so I've created the issue for more detailed investigation: https://youtrack.jetbrains.com/issue/DBE-8074
@Zhaojuanwendy,
Could you follow the mentioned YouTrack ticket https://youtrack.jetbrains.com/issue/DBE-8074 and provide sample csv file, specify your db engine?