7 Replies Latest reply: Nov 14, 2012 5:16 AM by darke

algoritm design

I have to write a programm where i have to compare some n number of files , where each file has key,value data , now i want to compare all the files get all the lines which have common keys in it , say

file1 file2 file3 file
tim,chase tom,someting tom,wright chase,w
tom,jerry vinay,b sachin,b tom,m

out put would be
tom,chase
tom,jerry
tom,wright
tom,m

what be the optimized algorithm to do this task.

My logic :

Start

Read each line from file and store it in memory as array (eg. array1[0] = "A", array1[1] = "B" and so on.

Since there are 4 files, I create 4 arrays = array1 to array4. Each of them will have the contents of their corresponding files.

Now I will compare the first words in the first array with the first word in the second array.

Now I will compare the first words in the first array with the second word in the second array and so on till the end on second array.

I will continue this till the last word in the last array.

When ever I found something was matching i will populate this in Arraylist

can anybody let me know better design or a complete desgin or altrenative or is this good enough
• 1. Re: algoritm design
The explanation of the problem is not clear.

Provide an example of file1 and file2 distinct and different.
For each file provide the lines that match and lines that do not match.

Then provide an explanation of what a match means and/or what a non-match means.
And don't use the same character (comma) both for the input and output. Doing so confuses what is just a separator and what is part of the data.
• 2. Re: algoritm design
file1
tim,chase
tom,jerry

file2
tom,someting
vinay,b

file3
tom,wright
sachin,b

file4

chase,w
tom,m

basically all files containt key,value pair entries

i want all common key with
values in a file in this case it is tom

desired output

tom,jerry
tom,wright
tom,m
• 3. Re: algoritm design
Vicky wrote:
file1
tim,chase
tom,jerry

file2
tom,someting
vinay,b

file3
tom,wright
sachin,b

file4

chase,w
tom,m

basically all files containt key,value pair entries

i want all common key with
values in a file in this case it is tom

desired output

tom,jerry
tom,wright
tom,m
You probably just need a single data-structure (A Map<String,List<String>>)

read file1 , populate the map with it's values .

So the map looks like
tim -> chase
tom -> jerry

read files 2-N . If the file contains a key not in the map , ignore that pair . Else add the value to the list referenced by the key.

So after reading file2 , the map will contain
tim -> chase
tom ->jerry,someting
the key vinay is ignored .

After you have read all files , discard the entries in the map which do not correspond to a list of size N .
• 4. Re: algoritm design
I think it quite imcomplete my intention is to get all commonkeys present in all 4 files, so lets if i leave vinay key as u mentioned and what if i add 5th file tommorow and it has vinay that time this will not work
• 5. Re: algoritm design
Vicky wrote:
I think it quite imcomplete my intention is to get all commonkeys present in all 4 files, so lets if i leave vinay key as u mentioned and what if i add 5th file tommorow and it has vinay that time this will not work
unless you change file1 , file3 and file4 to include the key vinay , you won't have it as common to all files . Or I don't understand your requirement :) .

What do you need -

Keys that occur in every file - so common to all files ?
OR
Keys that occur in more than 1 file ?
• 6. Re: algoritm design
Keys that occur in more than 1 file :)

Edited by: Vicky on Nov 13, 2012 10:15 PM
• 7. Re: algoritm design
So don't ignore any pairs , and in the final step , only discard entries which correspond to a list of size 1 i.e. occurring only in 1 file ..