Skip to Main Content

Data Science & Machine Learning

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Interested in getting your voice heard by members of the Developer Marketing team at Oracle? Check out this post for AppDev or this post for AI focus group information.

How to hdfs.attach multiple files in the same dir and sub dirs

865185Sep 22 2017

hi Experts,

I am using oraah2.7, when using hdfs.attach, I found it could not attach multiple files in the same dir or in sub dirs, here is some detail:

>library(ORCH)

>spark.connect("yarn-client", memory="512m", dfs.namenode="bigdatalite.localdomain")

>testlr_model <- hdfs.attach("/mytest/testlr_model", force=TRUE)

>hdfs.dim(testlr_model,force=TRUE)

>hdfs.describe(testlr_model)

>testlr_original <- hdfs.get(testlr_model)

>print(testlr_original)

If there is multiple files under /mytest/testlr_model, for example, test1.txt, test1b.txt,  the hdfs.get will return error message like:

Error: local:"/tmp/orch74bb26f661cc" content != hdfs:"/mytest/testlr_model/[^_.]*"

Error: source file "/tmp/orch74bb26f661cc/test1b.txt" doesn't exist

hdfs.dim reply:

> hdfs.describe(testlr_model)

           NAME                VALUE

1          path /mytest/testlr_model

2        origin                 HDFS

3         class               matrix

4         types     integer, integer

5         names           val1, val2

6           dim               -1 x 2

7   categorized                FALSE

8       has.key                FALSE

9    key.column              -1:NULL

10    empty.key                FALSE

11 has.rownames                FALSE

12      key.sep                    

13    value.sep                    ,

14       quoted                FALSE

15     pristine                 TRUE

16      trimmed                FALSE

17       binary                FALSE

18         size                  105

19        parts                    4

If there is only 1 file, such as test1.txt in /mytest/testlr_model, and there are sub-dir named sub1 and sub2 under it, both has 1 files in it, the above script ran well but the print only show rows in test1.txt, no rows in files of subdir was showed.

in hdfs:/mytest/testlr_model, I didn't have _ORCHMETA__ defined myself, and there are only little rows in the test files.

What's the error could be?

Best regards

Comments

Locked Post
New comments cannot be posted to this locked post.

Post Details

Locked on Oct 20 2017
Added on Sep 22 2017
0 comments
174 views