1 2 Previous Next 21 Replies Latest reply on Mar 8, 2019 5:26 PM by dbb-Oracle

    Combine two pgql queries

    mohanr.k

      I have a requirement to count the destination vertex by limiting the source vertices to certain number. I wrote below two queries for this but I am not able to figure out how to combine both queries to get desired result.

       

      1) Query to Limit the source vertices  select * MATCH (s) where s.label='Test' LIMIT 5000

      2) Query to count destination vertices after applying some filter

      select count(d) AS destVCount

      MATCH (s)-[r]->(d)

      where s.label='Test' and r.date_num>=17656547654

       

      Lets say if I have total 10000 source vertices of certain type and if user wants to find total number of destination vertices for only 5000 source vertices type. Any help on how to achieve this?

        • 1. Re: Combine two pgql queries
          oskarvanrest-Oracle

          Hi mohanr.k,

           

          Since you want to do aggregation after LIMIT, it would require subquery support in the FROM clause like in SQL, but PGX doesn't support this a.t.m.

           

          However, there may be another way to get what you want. Is it possible to express the LIMIT as a filter?

          For example, if the vertex IDs are numbered from 0 to n, can you do something like id(s) < 1000 and then use a scalar subquery to sum up the counts:

           

          select SUM( ( SELECT count(d) match (s) -[r]->(d) WHERE s.label='Test' and r.date_num>=17656547654) ) AS destVCount

          MATCH (s)

          where id(s) < 1000

           

          So, this is using a scalar subquery in the SELECT clause. Such scalar subqueries are supported from PGX 3.1 (in OSG 18.1 (via patch) & OSG 19.1) onward.

          • 2. Re: Combine two pgql queries
            Gianni Ceresa

            As said by Oskar, I believe you can use subqueries: PGQL 1.1 Specification - 21 December 2017 | PGQL · Property Graph Query Language

            If in the subquery you do your LIMIT 5000 you could, in theory use that subset on the upper query for your counting. Knowing it isn't really doing the top query on the subset but you filter the whole graph matching it with the subset (EXISTS would do that in theory).

            Probably not going to be great in performance.

             

            More globally I would challenge the requirement: doing a count on a query using a LIMIT will, in theory, return you different values if you run the same query again and again, because the LIMIT doesn't come with a fixed sorting and therefore your set of 5000 rows could change between each execution.

            Maybe an average between the COUNT and the number of matched nodes is more reliable/meaningful?

            • 3. Re: Combine two pgql queries
              mohanr.k

              Unfortunately my IDs are not numbered in order

              • 4. Re: Combine two pgql queries
                mohanr.k

                I created below query. But I feel this is not correct because I am getting same results with and without sub query

                 

                select count(d) AS destVCount

                MATCH (s)-[r]->(d)

                where s.label='Test' and r.date_num>=17656547654 and EXISTS(

                select * MATCH (s1) where s1.label='Test' and s1.id=s.id

                LIMIT 5000

                )

                 

                Could you please correct me if something wrong with above query

                • 5. Re: Combine two pgql queries
                  Gianni Ceresa

                  Can't manage to make it work ...

                  I guess that logically it's pushing too much the idea of EXIST (which is more for a row by row check if the row exist for the sub condition).

                  If you do that in PGX shell or any other language you can cut it down in 2 pieces: the LIMIT query first and using results from there for the other one.

                  • 6. Re: Combine two pgql queries
                    mohanr.k

                    Thank you. I was trying to figure out if it is possible to do with one PGQL query.

                     

                    I will try with shell or by writing java program

                    • 7. Re: Combine two pgql queries
                      Gianni Ceresa

                      The analytical inside me still can't digest a query which could generate different results when executed multiple times on the same dataset, but still ...

                      In 3.2.0 there is the ability to create a subgraph from a PGQL result, issue is that I have no idea if that's just a 3.2.0 thing or it was already in 3.1.0 (no detailed change log available AFAIK), but still could be an option to explore.

                       

                      And of course plan B: such a simple query will be easier to write in SQL as your graph is probably stored in the database. Even if it means using a different approach and not really using any graph tool, I still believe in "the right tool for the right job", so SQL doesn't shock me for this kind of need.

                      • 8. Re: Combine two pgql queries
                        Vlad Haprian-Oracle

                        The simplest option we have so far is to create a subgraph from the result of the first query and then run the second query on that sub-graph. From a groovy shell, for example, this can be achieve like that:

                         

                        // Run first query and get a result set

                        resultSet = g.queryPgql("SELECT s MATCH (s) WHERE s.label='Test' LIMIT 5000")

                         

                        // Define a filter on the result set for the column "s". This filter will be used to fetch a subgraph.

                        resultSetVertexFilter = new ResultSetVertexFilter(resultSet, "s")

                         

                        // // Create a subgraph of g containing the matched vertices in the resultSet and the edges that connect them if any.

                        subGraph = g.filter(resultSetVertexFilter)

                         

                        // Run your second query on the subgraph you just created

                        finalResult = subGraph.queryPgql("''

                        select count(d) AS destVCount

                        MATCH (s)-[r]->(d)

                        where s.label='Test' and r.date_num>=17656547654

                        '''")

                        • 9. Re: Combine two pgql queries
                          mohanr.k

                          Thank you. It is throwing below exception while applying filter (subGraph = g.filter(resultSetVertexFilter))

                           

                          11:29:16.391 [pgx-client-thread-2] ERROR oracle.pgx.client.PgxRemoteFuture - caught exception

                          java.lang.UnsupportedOperationException: ResultSetVertexFilter(pgql_21, s) is not supported for subgraph creation code: PGX-ERROR-2Z08B4F8X0DN9

                          at oracle.pgx.common.marshalers.ExceptionMarshaler.toUnserializedException(ExceptionMarshaler.java:77) ~[pgx-api-3.1.0.jar:3.1.0]

                          at oracle.pgx.common.marshalers.ExceptionMarshaler.unmarshal(ExceptionMarshaler.java:105) ~[pgx-api-3.1.0.jar:3.1.0]

                          at oracle.pgx.client.RemoteUtils.parseExceptionalResponse(RemoteUtils.java:123) ~[pgx-client-3.1.0.jar:3.1.0]

                          at oracle.pgx.client.RemoteUtils.getResponseContent(RemoteUtils.java:91) ~[pgx-client-3.1.0.jar:3.1.0]

                          at oracle.pgx.client.RemoteUtils.parse(RemoteUtils.java:146) ~[pgx-client-3.1.0.jar:3.1.0]

                          at oracle.pgx.client.PgxRemoteFuture.executeRequest(PgxRemoteFuture.java:105) ~[pgx-client-3.1.0.jar:3.1.0]

                          at oracle.pgx.client.PgxRemoteFuture.lambda$fetchRemoteResultAsync$7(PgxRemoteFuture.java:181) ~[pgx-client-3.1.0.jar:3.1.0]

                          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]

                          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]

                          at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]

                          java.util.concurrent.ExecutionException: java.lang.UnsupportedOperationException: ResultSetVertexFilter(pgql_21, s) is not supported for subgraph creation code: PGX-ERROR-2Z08B4F8X0DN9

                          • 10. Re: Combine two pgql queries
                            Vlad Haprian-Oracle

                            Which version of PGX are you using ? This feature is available only starting from PGX 3.2.

                            • 11. Re: Combine two pgql queries
                              mohanr.k

                              Oh ok. I am using PGX 3.1.0.Looks like the API is available in 3.1.0.

                               

                              Any idea how to implement it in PGX3.1.0

                              • 12. Re: Combine two pgql queries
                                Vlad Haprian-Oracle

                                What you can do is the following:

                                 

                                // Run first query and get a result set

                                resultSet = g.queryPgql("SELECT s MATCH (s) WHERE s.label='Test' LIMIT 5000")

                                 

                                // Create a boolean property

                                vp = g.createVertexProperty(PropertyType.BOOLEAN, "valid")

                                vp.fill(false)

                                 

                                // Mark the vertices in the subgraph as valid

                                for (r in resultSet) {

                                  v = r.getVertex(1)

                                  vp.set(v, true)

                                }

                                 

                                // Create the subgraph based on the "valid" property

                                subGraph = g.filter(new VertexFilter("vertex.valid"))

                                 

                                // Run your second query on the subgraph you just created

                                finalResult = subGraph.queryPgql("''

                                select count(d) AS destVCount

                                MATCH (s)-[r]->(d)

                                where s.label='Test' and r.date_num>=17656547654

                                '''")

                                 

                                This solution is expected to have worse performance than the one presented before.

                                • 13. Re: Combine two pgql queries
                                  mohanr.k

                                  Thank you. Yes you are correct it is taking lot of time.

                                  I implemented my requirement by iterating over both queries results separately and writing some java logic

                                  • 14. Re: Combine two pgql queries
                                    Vlad Haprian-Oracle

                                    The solution which works only starting in 3.2 would be much faster.

                                    1 2 Previous Next