AnsweredAssumed Answered

Clustering with WAS and Alfresco 3.2 Enterprise

Question asked by joerivg on Feb 26, 2010
Dear,

As in topic, I am trying to create an Alfresco cluster on a WAS cluster …

Unfortunately my efforts are in vain and I am out of ideas, so I am turning myself towards the community …

I have done / tried the following :

1. I changed alfresco-global.properties and added the following entry :
#Clustering setup
alfresco.cluster.name=Earchiving
alfresco.jgroups.configLocation=classpath:alfresco/jgroups-earchiving-cluster.xml

2. I created jgroups-earchiving-cluster.xml with the following contents :
<config>
    <TCP bind_port="${alfresco.tcp.start_port:7800}"
         loopback="true"
         recv_buf_size="20000000"
         send_buf_size="640000"
         discard_incompatible_packets="true"
         max_bundle_size="64000"
         max_bundle_timeout="30"
         enable_bundling="true"
         use_send_queues="false"
         sock_conn_timeout="300"
         skip_suspected_members="true"
        
         thread_pool.enabled="true"
         thread_pool.min_threads="1"
         thread_pool.max_threads="25"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="run"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="run"/>
                        
    <TCPPING timeout="3000"
             initial_hosts="LAAKDAL-TST-16[7800],laakdal-tst-17[7800]"
             port_range="${alfresco.tcp.port_range:3}"
             num_initial_members="2"/>
    <MERGE2 max_interval="30000"
              min_interval="10000"/>
    <FD_SIMPLE timeout="10000" max_missed_hbs="10" />
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK
                   use_mcast_xmit="false" gc_lag="0"
                   retransmit_timeout="300,600,1200,2400,4800"
                   discard_delivered_msgs="true"/>
    <UNICAST timeout="300,600,1200" />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="400000"/>
    <VIEW_SYNC avg_send_interval="60000"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"
                view_bundling="true"/>
    <FC max_credits="2000000"
        min_threshold="0.10"/>
    <FRAG2 frag_size="60000"  />
    <pbcast.STREAMING_STATE_TRANSFER/>
    <!– <pbcast.STATE_TRANSFER/> –> 
</config>

3. I renamed ehcache-custom.xml.sample.cluster  to ehcache-custom.xml and stored in my WAR\WEB-INF\classes\alfresco\extension dir

When I launch my servers I notice the following in my WAS log :

———————————————————
GMS: address is LAAKDAL-TST-16-13742 (cluster=Earchiving:org.alfresco.enterprise.repo.management.subsystems.PropertyBackedBeanExporter)
———————————————————
[2/26/10 14:51:47:594 CET] 00000034 SystemOut     O
———————————————————
GMS: address is LAAKDAL-TST-16-42507 (cluster=Earchiving:EHCACHE_HEARTBEAT)
———————————————————

but that is the only hearth beat I see … And clustering is not working … When I connect with webservices towards alfresco, I get authentication & ticket exceptions because I arrive with a ticket from server A to server B (this information is not shared == no cluster)

And at the end of my deployment I see


[2/26/10 14:52:02:969 CET] 000000dd HeartBeat     W org.alfresco.enterprise.heartbeat.HeartBeat$HeartBeatJob execute org.alfresco.error.AlfrescoRuntimeException: 01260001 Exception in Transaction.
[2/26/10 14:52:02:969 CET] 000000dd JobRunShell   I org.quartz.core.JobRunShell run Job DEFAULT.heartbeat threw a JobExecutionException:
                                 org.quartz.JobExecutionException: org.alfresco.error.AlfrescoRuntimeException: 01260001 Exception in Transaction. [See nested exception: org.alfresco.error.AlfrescoRuntimeException: 01260001 Exception in Transaction.]
   at org.alfresco.enterprise.heartbeat.HeartBeat$HeartBeatJob.execute(HeartBeat.java:388)
   at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
   at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:529)
Caused by: org.alfresco.error.AlfrescoRuntimeException: 01260001 Exception in Transaction.
   at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:412)
   at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:253)
   at org.alfresco.enterprise.heartbeat.HeartBeat.sendData(HeartBeat.java:265)
   at org.alfresco.enterprise.heartbeat.HeartBeat$HeartBeatJob.execute(HeartBeat.java:382)
   … 2 more
Caused by: java.net.SocketTimeoutException: connect timed out
   at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:391)
   at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:252)
   at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:239)
   at java.net.Socket.connect(Socket.java:551)
   at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:403)
   at sun.net.www.http.HttpClient.openServer(HttpClient.java:521)
   at sun.net.www.http.HttpClient.<init>(HttpClient.java:246)
   at sun.net.www.http.HttpClient.New(HttpClient.java:320)
   at sun.net.www.http.HttpClient.New(HttpClient.java:337)
   at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:838)
   at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:790)
   at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:715)
   at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:882)
   at org.alfresco.enterprise.heartbeat.HeartBeat$1.execute(HeartBeat.java:282)
   at org.alfresco.repo.transaction.RetryingTransactionHelper.doInTransaction(RetryingTransactionHelper.java:327)
   … 5 more

I don't know if this is related …

I looked on my server and I can't see socket 7800 open and listening for connections … so my guess is that websphere blocks this …

Any suggestions how to continue ?

Much appreciated …

With kind regards,

Joeri VG

Outcomes