Discussion:
[orientdb] Question about ETL example in the docs
Behrang
2018-03-15 12:16:41 UTC
Permalink
There's an example <https://orientdb.com/docs/last/Transformer.html> in the
docs that is like this:

{
"source":{
"content":{

"value":"id,name,surname,friendSince,friendId,friendName,friendSurname\n0,Jay,Miner,1996,1,Luca,Garulli"
}
},
"extractor":{
"row":{

}
},
"transformers":[
{
"csv":{

}
},
{
"vertex":{
"class":"V1"
}
},
{
"edge":{
"unresolvedLinkAction":"CREATE",
"class":"Friend",
"joinFieldName":"friendId",
"lookup":"V2.fid",
"targetVertexFields":{
"name":"${input.friendName}",
"surname":"${input.friendSurname}"
},
"edgeFields":{
"since":"${input.friendSince}"
}
}
},
{
"field":{
"fieldNames":[
"friendSince",
"friendId",
"friendName",
"friendSurname"
],
"operation":"remove"
}
}
],
"loader":{
"orientdb":{
"dbURL":"memory:ETLBaseTest",
"dbType":"graph",
"useLightweightEdges":false
}
}
}

In the *edge* transformer's *lookup* field, what do *V2* and *fid* refer
to? *V2* is not defined in the vertex transforms and *fid* is not a column
in the CSV input. Where are they coming from?

In particularly, I have two sets of CSV files:

*users.csv:*
username,first_name,last_name
user1,John,Doe
user2,Jane,Doe
user3,Gene,Doe

*user_friends.csv:*
username,friend_name
user1,user2
user1,user3
user2,user1
user2,user3
user3,user1
user3,user2

I first import the users.csv using this ETL config:

{
"source": {
"file": {
"path": "/tmp/users.csv"
}
},
"extractor": {
"csv": {}
},
"transformers": [
{
"vertex": {
"class": "User"
}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:/temp/databases/users_friends",
"dbType": "graph",
"classes": [
{
"name": "User",
"extends": "V"
},
{
"name": "HasFriend",
"extends": "E"
}
],
"indexes": [
{
"class": "User",
"fields": [
"username:string"
],
"type": "UNIQUE"
}
]
}
}
}

And all the records are imported without any errors. Then I want to import
the friendship CSV using the following ETL config:

{
"source": {
"file": {
"path": "/tmp/user_friends.csv"
}
},
"extractor": {
"csv": {}
},
"transformers": [
{
"vertex": {
"class": "User"
}
},
{
"edge": {
"class": "HasFriend",
"joinFieldName": "friend_name",
"lookup": "User.username",
"direction": "in"
}
}
],
"loader": {
"orientdb": {
"dbURL": "plocal:/temp/databases/users_friends",
"dbType": "graph",
"classes": [
{
"name": "User",
"extends": "V"
},
{
"name": "HasFriends",
"extends": "E"
}
],
"indexes": [
{
"class": "User",
"fields": [
"username:string"
],
"type": "UNIQUE"
}
]
}
}
}

However the import fails due to the fact that the same username can appear
in multiple rows in the second CSV file:

Uncaught exception in thread 'pool-2-thread-1'
com.orientechnologies.orient.core.storage.ORecordDuplicatedException:
Cannot index record User{friend_name:user-2,username:user1}: found
duplicated key 'user-0' in index 'User.username' previously assigned to the
record #25:0

Is there a way to handle scenarios like this?

Thanks in advance.
--
---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
u***@gmail.com
2018-03-21 03:18:56 UTC
Permalink
Hi,

what version are you using?

Thanks

Regards,

Michela
--
---
You received this message because you are subscribed to the Google Groups "OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orient-database+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...