Forum Stats

  • 3,760,221 Users
  • 2,251,664 Discussions


How to hdfs.attach multiple files in the same dir and sub dirs

hi Experts,

I am using oraah2.7, when using hdfs.attach, I found it could not attach multiple files in the same dir or in sub dirs, here is some detail:


>spark.connect("yarn-client", memory="512m", dfs.namenode="bigdatalite.localdomain")

>testlr_model <- hdfs.attach("/mytest/testlr_model", force=TRUE)



>testlr_original <- hdfs.get(testlr_model)


If there is multiple files under /mytest/testlr_model, for example, test1.txt, test1b.txt,  the hdfs.get will return error message like:

Error: local:"/tmp/orch74bb26f661cc" content != hdfs:"/mytest/testlr_model/[^_.]*"

Error: source file "/tmp/orch74bb26f661cc/test1b.txt" doesn't exist

hdfs.dim reply:

> hdfs.describe(testlr_model)

           NAME                VALUE

1          path /mytest/testlr_model

2        origin                 HDFS

3         class               matrix

4         types     integer, integer

5         names           val1, val2

6           dim               -1 x 2

7   categorized                FALSE

8       has.key                FALSE

9    key.column              -1:NULL

10    empty.key                FALSE

11 has.rownames                FALSE

12      key.sep                    

13    value.sep                    ,

14       quoted                FALSE

15     pristine                 TRUE

16      trimmed                FALSE

17       binary                FALSE

18         size                  105

19        parts                    4

If there is only 1 file, such as test1.txt in /mytest/testlr_model, and there are sub-dir named sub1 and sub2 under it, both has 1 files in it, the above script ran well but the print only show rows in test1.txt, no rows in files of subdir was showed.

in hdfs:/mytest/testlr_model, I didn't have _ORCHMETA__ defined myself, and there are only little rows in the test files.

What's the error could be?

Best regards

This discussion has been closed.