Datagrip feedback from DWH perspective

已回答

创建于 2021年01月22日 12:29

Hi Jetbrains team,

As a customer and heavy user of Datagrip for a few years, I've been recently rather mispleased with the roadmap of the product, so I wanted to share my feedback which I hope represents a part of industry (and user base) thas is being overlooked. Don't get me wrong, I still believe Datagrip is the best product currently available, but it's such a shame the potential is not leveraged.

Let me explain a bit of background for you to understand where I'm coming from with this feedback. I'm a Data Warehousing (DWH) expert, over the last decade involved in building analytical solutions for a number of clients across industries. Based on my experience, working in different roles, I understand the product can have following groups of users:
a) DWH Developers - In DWH world those are the people most concerned about creating data pumps (ETL/ELT), translating mapping documents from analysts into data pumps, heavily SQL focused. I think the product does well for this target group, as developers are used to work with rather complex and technical tools. However, some of the needs in this area are not respected.
b) Data modelers - Those are the designers of database objects. As the approach to building DWH tends to be model-driven architecture (MDA), modelers would use tools such as PowerDesigner to construct and iterate from general ideas of business concepts down to physical tables and columns (conceptual/logical/physical model). DDLs are then generated from the physical model and applied to database. Datagrip is only capable of a very basic visualization of the tables that provides no added value. The product doesn't help this target group in any way.
c) DWH analysts - The analysts are usually split into technical and business based on whether they have a good understanding of the technology involved or rather the business terms used, also wheter they work closely with the development team or sit in business deparments. DWH Analysts in general have the need to perform complex ad-hoc queries on multiple databases using different accounts in order to find root causes of issues, elaborate new features as well as check whether each data consumer group has proper access. While analysts close to technology can get effiecient in the product quickly, others would struggle.
e) DataOps - Those are the people that have full access to production and from time to time can make slight adjustments to the data, usually by running a script prepared by Developers and/or maintain CI/CD pipelines for quick propagation of DMLs and DDLs to higher environments.
f) Data Scientists - I will not cover as those are either using SQL similarly to analysts or Python/R or else using other products.

As Datagrip is commercial-only product, it is unlikely it will get into hands of DWH experts. We work in companies that only accept software they purchased or freeware as long as it passes security checks, since we are working with sensitive data. I've been using the product on my work computers only because I pay for the license without any contribution from the company, install it myself in a hacky way and my managers tolerate me using it. A different case is Pycharm, because it has the free version, I've seen a couple of colleagues using it - outdated version installed from company approved site. Pycharm Community is actually how I learned about other Jetbrains products and led me to using Datagrip.

Now to the actual feedback on features released in 2020:
1. Most of the new features are concerned with ancient technologies like MySQL, Postgres, DB2, etc. In DWH world the leading technologies are Teradata, Redhshift, Snowflake, BigQuery, MSSQL, etc. There is lack of support for these, some are ignored, others have buggy connections, syntax highlighting. There is lot of features aimed at improving coding of stored procedures which are rarely used in modern DWH. I can also see a lot of investment into NoSQL features and nested stuff that is rarely used for DWH.
2. Introspection is really annoying. We tend to use only a few databases, usually split by environment DEV/TEST/PROD. But we need multiple users to run analysis and checks. As there is only one user per connection, this creates tens of connections to the same physical tables. After I start Datagrip I have to do something else for about 20 minutes until all connections settle down.
3. DDLs as I explained above are rarely done by coding it. DMLs directly on database are not allowed unless you're DataOps.
4. Some of the features brought in are a bit of a joke in DWH world - "Images in database", "Export to Excel".
5. What I do appreciate are improvements in the General Dialect. Due to the lack of database support, this is the dialect I suspect is most heavily used by DWH experts. Please focus on not breaking this one as it happened multiple times.
6. An older feature "JOIN completion" came in with big announcement a time ago, so I was really excited. It never worked as expected, so we're still typing JOINs manually. I think this has something to do with the way you're testing those features on databases that have foreign keys and/or joins within same schemas. Having database constraints is anti-pattern for DWH achitecture and JOINs happen frequently across different schemas of a DWH.

Please let me know if some of the points above need more clarification and I wish you all well at JetBrains.

Kind regards,

Miroslav Smerda

7 条评论

Maxim Sobolevskiy

创建于 2021年01月22日 15:31

正式评论

Hello, Miroslav! It is always an honor to receive such detailed feedback. I imagine that you spent a lot of time getting it all collected, sorted and written! Usually, exactly these types of texts drive us to make DataGrip better. We already discussed your letter inside the team and I am coming up with the answer. I will try to cover all your points, and if something is missed, it happened unintentionally :)

Thanks for describing roles in DWH world. The fact was that when we launched DataGrip our main focus was targeting database developers and perhaps it is still true. And that is why some of our features seem to be not very useful for you and your environment.

On the other hand, we see more and more 'not exactly developer' persons and companies adopting DataGrip. And thanks for helping us understand this perspective. Some comments on the groups you've highlighted.

a) DWH Developers. This is the closest group to our main focus. You say that 'some of the needs in this area are not respected' with no following elaboration. Can we ask to do so? :)

b) Data modelers. And again you are right: we won't help this segment now. There is a ticket for the huge design feature: https://youtrack.jetbrains.com/issue/DBE-268. Just a couple of months ago we've been discussing this with the team. While we have some ideas in that field, I think it won't make it into our roadmap for at least one or two years. And we will still be not the needed tool for data modelers.

c) DWH analysts. Perhaps the most interesting group you listed. We are useful here but we can be much more useful! And we are going to. Of course, there is a curve for, as you put it, 'analysts not close to technology' and we know it. I believe every release we simplify and polish our UI so that a new user won't feel abandoned in DataGrip being too technical. Iа you have some particular steps in mind which can help us here, we would be glad to know!

e) DataOps. We think we are good here :)

f) Data Scientists. As you've said, that's a little bit another type of field, but just FYI we at JetBrains have fresh tools for data scientists. You can learn more about them here: https://www.jetbrains.com/data-tools/

And some comments on your 2020 feature feedback. Some of them can be 'a bit of a joke in DWH world' but there are other worlds in our galaxy which may need'em :)

You've said: "Teradata, Redhshift, Snowflake, BigQuery, MSSQL, etc. There is a lack of support for these, some are ignored, others have buggy connections, syntax highlighting." All these databases except for Teradata should not have any buggy connections or issues with syntax highlighting! If there are any, please report them to us. BigQuery may have issues with the introspection, but we are going to release or own GBQ introspection in the next version. Regarding Teradata, science is not supported, the work with it can sometimes be not 100% handy. But we try to fix issues with unsupported databases if there are show-stoppers.

The point about loooong introspection of the data sources which are actually the same is 100% valid and clear! I think we can come up with something like 'session' types so that you will need to have only one data source and change credentials and other parameters 'inside'. We'll take a look at it next week.

And JOIN completion. It is still not clear what actually doesn't work in your case! I see that there are no foreighn keys, but we do complete using column names and also provide the ability to set up your own virtual connections! So if something from those features doesn't help you, please add some details!

Thank you again for your letter and have a nice weekend!

Max.

Jim Gust

创建于 2021年01月24日 22:16

Miroslav

I also work primarily on a DWH but my role is not as an ETL/ELT developer but as a report developer. Most of my time is spent writing stored procedures for consumers such as Jasper, Crystal Reports, CSV...

To your point #6, I also don't have any db constraints but found the regex JOIN clauses met my needs.

This pattern is what I use. It matches based on the column name matching but excluding UPDATE_DTTM and UPDATE_USER.

Column Pattern: (.*)
Target column pattern: .*\\.$1(?<!UPDATE_DTTM|UPDATE_USER)

In the example below the join completion suggested all of the columns that matched with exception of the excluded columns.

Miroslav Smerda

创建于 2021年02月23日 20:32

Hi Maxim,

Big thanks for reflecting my feedback. I'm happy to see improvements for Redshift in the current EAP as that is what I'm using heavily at my current client. I did raise a couple bugs around Redshift connection in the past and I'm not facing those anymore.

So the clarifications and additional points below are based on my experience with Datagrip 2021.1 EAP + Redshift. Sorry that I can't provide any screenshots or such due to confidentiality.

Firstly, to your questions how to better serve the needs of developers & analysts. From my perspective it is essential that the UI allows as much space for either working with the code following a "smooth flow" or presenting actual data with wide and deep view of the result.

1. Double clicking a table in Database view presents results in top area while query is shown on bottom. It is exactly opposite if you type a query and run it. Still confuses me to this day and forces me to refocus and resize the windows. It would be better to show a simple select * in the top and results in the bottom, then I could quickly adapt the query without needing the bar (WHERE/ORDER BY) on top that also takes space in the UI.

2. Left part of Services panel is very technical, obstructing the view, catching too much attention. For now I'm shrinking it. I've found it rather confusing to have different sessions per connection. I force my connections to single session, but then running a query during introspection is impossible.

3. There is no concept of multiple users per database as you said. Also working between two systems is complicated as you have to switch Dialects. I'd expect Dialect to be according to the selected database, not vice versa. I'd like to switch between database (system/environment) & users in the place of curent "Switch attached sessions".

4. When I moved to Datagrip I had to adapt to live templates to be more productive. But it makes you jump there and back in the generated code to fill in details. What I'd like to see is to type "sel schema.table" and get quick access to "*/count(*)/expand_columns" options and quickly move below the generated code to add joins, filters etc. That would give me "smooth flow" through the code in one direction.

5. When typing column names the code completions gives me a bunch of suggestions to random functions. Sometimes I have to scroll way down to find what I need. Sure I could type initial letters to get better suggestions, but I don't remember exact names of thousands of attributes. I'd like to type a familiar word and get code completion to suggest me column names with that word.

Secondly, on the topic of JOIN Completion. @Jim, thank you very much for the suggestion. I don't know the data structure behind, so I went ahead blindly filling list of our technical columns in your regex.

a) Now it takes multiple seconds to load the suggestions. I can type it faster. :)
b) What I'm getting is suggested joins to one random other table in a different schema, what I need is nowhere. I'd expect to see suggestions of joins to multiple tables in the same schema on the top. As an example, I'd like to see this working when I start from a fact table and join in dimension tables (in Kimball) or relational and other facts (in 3NF). Only later I would add tables from diferent schemas.
c) The generated join is wrong, because the right table is not qualified (missing schema name).
d) So I tried one more thing to type the schema.table first and then JOIN. It only presents the columns not joins.

Hope this helps to clarify the issues.

Thank you!

Miroslav

Maxim Sobolevskiy

创建于 2021年02月24日 11:03

Thanks for the answer!

1. Well, usually nobody complaints about that :) In 2021.1 we'll introduce "select top N from table" generation from the database explorer, perhaps it can help.

2. If you don't want to see the services left part, there is a solution, please check the 'Console output in a separate tab' section here: https://www.jetbrains.com/datagrip/whatsnew/#running-queries. Regarding the problem about introspection happening in the same session in 'single connection mode', we'll think about the solution.

3. We are currently working on the concept of handy work with several users inside one data source. But it is not clear about the problem with dialects, can you please give some details? Anyway, now the best solution is to have separate data sources for each environment.

4. To keep the flow straight, you can use postfix completion: https://www.jetbrains.com/help/datagrip/auto-completing-code.html#postfix_completion. Also, for count(*) you can just create the dedicated template.

5. Sometimes it happens, but we'll try to fix all issues of that kind. Can you share at least an abstract example, not from a real database?

6. Regarding your troubles with JOINS, can I ask to create requests in our tracker? https://youtrack.jetbrains.com/issues/DBE

Miroslav Smerda

创建于 2021年02月24日 18:36

Thank you again. I understand my suggestions on the UI might or might not be relevant as you're the experts in the area. I was rather trying to give some food for thought about what I consider a usual workflow in DWH area.

But since we moved to issue solving mode, below is more detail. :)

1. It is okay for me to be among "usually nobody" complaining about this. :) I guess this is meant for editing features (don't need that in DWH as I explained). I would love to be able to make the future feature "select top/limit" a default when double clicking tables, ideally pasted in console with cursor ready for quickly joining/filtering and the query ran already.

2. & 3. Sorry I missed "services tab" removal feature has been added lately, all good there. My actual point was that the whole Services concept seems complicated. An average DWH person just needs to connect to some database with some user and start querying. Regarding the Dialect, I will give an example. As I said my current work is mainly in Redshift. But sometimes I need to use Oracle. Ok so I create a new sql file, then have to scroll through all Dialects and choose one. Only then I'm presented with my connections to Oracle. When I used to do Teradata or Bigquery back in the day I had to realize this was unsupported, pick the Generic etc. I'd like to directly pick from my connections and have the dialect set accordingly.

4. I did create a live template for count(*) the day I installed Datagrip. I can't stress enough how often this is used in DWH. :) I gave a try to "postfix completion", thanks for that and again pardon my ignorance. I will make my templates and test this more. So far I'm missing auto generated alias for the table.

5. I will give an example on code completion issues with the postfix feature I'm currently testing. So I type "select ccp" (ccp is initial letters of my schema). I'm getting some 3 objects in pg_catalog, 1 function, my schema is on 5th place. So I go down to pick the schema and then I get random columns, would expect to see list of tables (ideally sorted by name). Now typing a word in table name still gives me columns, so I have to think about table name initials. After picking the table I'm still getting columns, would rather enjoy those cool postfixes. Fast forward I'm constructing where filter, typing "area" to get columns with that word in name, but rather getting 5+ functions on top of suggestion list.

6. I'd like to play more with the suggested regex, couldn't find documentation, but I just noticed the check button (wow). Will probably come back later with the tickets.

Jimgust

创建于 2021年02月24日 19:52

Edit: Created issue https://youtrack.jetbrains.com/issue/DBE-12700

Based on your findings I also looked into cross schema joins using regex. What I found for me is it works on tables and views but it doesn't work on synonyms.

With table to table cross schema I'm presented with a join meeting my regex

In the next two screenshots the ES is the tables and SCH_GRV schema is using synonyms to the ES schema. The table to synonym doesn't pull up the regex join while the table to table does.

Maxim Sobolevskiy

创建于 2021年03月15日 10:47

Hi, Miroslav! Sorry for late answer.

1. Context Live templates will be released soon. Their text is fully customizable, and you out cursor at any place. Double-click for invoking seems like a future outside of the MVP. Please try 2021.1 in a couple of weeks and then report wished to our tracker.

2. All news will appear here: https://youtrack.jetbrains.com/issue/DBE-12821

3. The logic that you need to choose just a target and the dialect will be chosen automatically seems super valid for me. But it seems real only if you open the file for the first time, am I right? If you already defined the correct dialect I can hardly imagine that you are going to change it.

4. Do you mean auto-generated alias for the table for the template? Because we can provide it in cc.

5. The example with not very helpful code completion: can you please tell which database is it? I want to reproduce and address it.

请先登录再写评论。