55

I'm using R to plot some data I pull out of a database (the Stack Exchange data dump, to be specific):

dplyr::tbl(serverfault, 
           dbplyr::sql("
select year(p.CreationDate) year,
       avg(p.AnswerCount*1.0) answers_per_question,
       sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
"))

The query works fine on SEDE, but I get this error in the R console:

Error: <SQL> 'SELECT *
FROM (
select year(p.CreationDate) year,
       avg(p.AnswerCount*1.0) answers_per_question,
       sum(iif(ClosedDate is null, 0.0, 100.0))/count(*) close_rate
from Posts p
where PostTypeId = 1
group by year(p.CreationDate)
order by year(p.CreationDate)
) "zzz11"
WHERE (0 = 1)'
  nanodbc/nanodbc.cpp:1587: 42000: [FreeTDS][SQL Server]Statement(s) could not be prepared. 

I reckoned "Statement(s) could not be prepared." meant that SQL Server didn't like the query for some reason. Unfortunately, it didn't give any hint about what went wrong. After fiddling with the query for a bit, I noticed it was wrapped in a subselect, according to the error message. Copying and executing the full query as constructed by one of the libraries in the chain, SQL Server gave me this more informative error message:

The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

Now the solution is obvious: remove (or comment out) the order by clause. But where is the detailed error message in the R console? I'm using Rstudio, should that matter. If I could get the full exception right next to the code I'm working on, it would help me fix bug a lot quicker. (And just to be clear, I get cryptic errors from dplyr::tbl often and typically use binary search debugging to fix them.)

8
  • What if you query with DBI::dbGetQuery()? Does that improve anything over dbplyr::sql?
    – Mako212
    Aug 10, 2018 at 19:29
  • 2
    Since the error message you do see is being thrown by nanodbc directly, I wonder how much dbplyr can really do about this. My hunch would be that this isn't something easily fixed by a tweak at the R level. If it were me I'd go straight to the horse's mouth and file a github issue which will get to the expert (Jim Hester) more directly.
    – joran
    Aug 10, 2018 at 19:32
  • 2
    Oh, I have no idea what driver / connection stack SEDE is using for its error messages, but the error in R is coming from the freeTDS driver and is the only information the odbc R package has. There isn't a way to make this error more informative except to use a different driver (maybe try the ones from Microsoft or the RStudio professional drivers)?
    – Jim
    Aug 10, 2018 at 21:54
  • 1
    I think this has since been resolved. When I run the literal code above (SQL Server 2016, DBI-1.1.2, odbc-1.3.3, R-4.1.2), I get the full detailed error message, *The ORDER BY clause ... is also specified.* Unfortunately, I'm using mssqlodbc itself, not freetds (never found it to be stable), so I cannot rule out if it's driver-specific or generic in dbplyr. @JonEricson, can you confirm if the bug still presents itself with FreeTDS? If not, I suggest this question can either be closed or self-answers as resolved-by-updates.
    – r2evans
    Apr 27, 2022 at 16:37
  • 2
    @r2evans: It's been a while. ;-) If I have a chance to reproduce the issue on my new machine, I'd be happy to self-answer. Apr 27, 2022 at 17:00

0

Browse other questions tagged or ask your own question.