hi Experts,
I am using oraah2.7, when using hdfs.attach, I found it could not attach multiple files in the same dir or in sub dirs, here is some detail:
>library(ORCH)
>spark.connect("yarn-client", memory="512m", dfs.namenode="bigdatalite.localdomain")
>testlr_model <- hdfs.attach("/mytest/testlr_model", force=TRUE)
>hdfs.dim(testlr_model,force=TRUE)
>hdfs.describe(testlr_model)
>testlr_original <- hdfs.get(testlr_model)
>print(testlr_original)
If there is multiple files under /mytest/testlr_model, for example, test1.txt, test1b.txt, the hdfs.get will return error message like:
Error: local:"/tmp/orch74bb26f661cc" content != hdfs:"/mytest/testlr_model/[^_.]*"
Error: source file "/tmp/orch74bb26f661cc/test1b.txt" doesn't exist
hdfs.dim reply:
> hdfs.describe(testlr_model)
NAME VALUE
1 path /mytest/testlr_model
2 origin HDFS
3 class matrix
4 types integer, integer
5 names val1, val2
6 dim -1 x 2
7 categorized FALSE
8 has.key FALSE
9 key.column -1:NULL
10 empty.key FALSE
11 has.rownames FALSE
12 key.sep
13 value.sep ,
14 quoted FALSE
15 pristine TRUE
16 trimmed FALSE
17 binary FALSE
18 size 105
19 parts 4
If there is only 1 file, such as test1.txt in /mytest/testlr_model, and there are sub-dir named sub1 and sub2 under it, both has 1 files in it, the above script ran well but the print only show rows in test1.txt, no rows in files of subdir was showed.
in hdfs:/mytest/testlr_model, I didn't have _ORCHMETA__ defined myself, and there are only little rows in the test files.
What's the error could be?
Best regards