Intel® Collaboration Suite for WebRTC
Community support and discussions on the Intel® Collaboration Suite for WebRTC (Intel® CS for WebRTC).

MCU v3.5.1 not publishing in nat environment, MCU v3.4.1 Webrtc process and port --critical issue

Kiran_Raj_RA
Beginner
876 Views

Dear Webrtc Team,

I have installed MCU conference v3.4.1 in CentOS Linux release 7.4.1708 (Core) x86_64 x86_64 x86_64 GNU/Linux    3.10.0-693.11.1.el7.x86_64.
Machine : aws instance c3.2xlarge 8core 15gb ram
Environment: Nat (internal-ip:72.31.26.106, public-ip:52.36.29.142) all ports opened

Browser supports below (Mcu Conference)
1. chrome Version 62.0.3202.94 (Official Build) (64-bit) chrome to chrome - working with below issues
2. mozilla firefox version 57.0.2 (64-bit)  ice failed not working 

Issues 
1. ERROR: woogeen.VCMFrameDecoder - (0x2a127e0)Decode frame error: -1  

2. Heavy webrtc connections comes to a single room the room will get the error below shows after that the room will go to locked state then no one     can enter in the room, if we restart the webrtc agent then we can enter in the room

ERROR: publish failed: Timeout to make rpc to 66c3f2c-3313-358b-11c7-1c3dd51ab4f0.publish 

WARN: WebRtcConnection - Bad source SSRC in RTCP feedback packet: 2412642309

3. we identified there are lots of woogen_webrtc zombiee process running without parent even after mcu stops, so we need to kill them manually or need to restart the server

4. we identified there are lots of woogen_webrtc  udp and tcp ports are opened and not closed even after mcu stops, so we need to kill them manually or need to restart the server

The logs and screenshots are attached for reference

---------installing MCUv3.5.1 Issues below----------
I have installed MCU conference v3.5.1 in CentOS Linux release 7.4.1708 (Core) x86_64 x86_64 x86_64 GNU/Linux    3.10.0-693.11.1.el7.x86_64.
Machine : aws instance c3.2xlarge 8core 15gb ram
Environment: Nat (internal-ip:72.31.26.106, public-ip:52.36.29.142) all ports opened

1. streams failed publishing due to nat not working even after changes made in webrtc and portal toml files
WARN: WebRtcConnection - Bad source SSRC in RTCP feedback packet: 4227660894 
ERROR: V10Client - soac failed: Session 967262438887071600 does NOT exist

please refer the toml files and logs attached
 

 

0 Kudos
18 Replies
Qiujiao_W_Intel
Employee
876 Views

Raj, please provide more info on following questions:

1. For 3.4.1 error, what's your testing scenario?did you use mix or forward stream, what's the video codec and how many connections in the room?

2. For 3.5.1 error, what's your network environment, please provide more info on your network topology

0 Kudos
Kiran_Raj_RA
Beginner
876 Views

both mix and sfu are used in single room

0 Kudos
Kiran_Raj_RA
Beginner
876 Views

3.5.1 is hosted in aws with all ports opened

0 Kudos
Naresh_R_1
New Contributor I
876 Views

Hi Support Team,

This issue has been reported earlier too

https://software.intel.com/en-us/forums/intel-collaboration-suite-for-webrtc/topic/744178

https://software.intel.com/en-us/forums/intel-collaboration-suite-for-webrtc/topic/746946

This issues started from MCU versions 3.3.1 and till today update.

Thanks

Naresh

0 Kudos
Qiujiao_W_Intel
Employee
876 Views

For 3.4.1 issue, it may related to the tcp port not release issue, Kiran, please use netstat -pan | grep woogeen to check the port usage

For 3.5.1, did you replace your own certification to webrtc_agent/cert folder?

0 Kudos
Kiran_Raj_RA
Beginner
876 Views

 Hi,

1. (Mcu3.4.1)using  netstat -pan | grep woogeen i can find unclosed ports and processes which is not stopped even after the mcu restarted so i restarted the entire server. 

2. (Mcu3.5.1) I have checked by replacing my certificate in webrtc_agent/cert folder also but still not working.

So these above issues i moved to version3.3 and made cluster in aws 

Instance A- 4 core 7gb ram centos 7.4 bandwidth high,  ----nuve, app, mongodb, portal, rabbitmq, cluster-manager.---->  in cluster strategy i changed webrtc strategy to round-robin  because i am using 2 webrtc agent

instance B- 8 core 16gb ram  centos 7.4 bandwidth high --- video-agent, audio-agent, sip-agent, sip-portal, avstream-agent, recording-agent --- working need clarification for a 8 core machine what are these values needed to change and give some details about this (maxProcesses = 13 , prerunProcesses = 2 ,  max_load = 0.85 )

Instance C and D 4 core 7gb ram bandwidth high  centos 7.4 ---webrtc -agent only --- Issue-1 i loaded  4 screen share and 4 cams published to room A from one machine chrome browser with 4 tabs, next machine chrome browser 3 screen share and 3 cams published, the streams are sometimes disconnecting the server load is less than 20%, when the process avg load is 70% -100% in top command  the room A runs fine but when users increased the streams getting disconnecting ----is any changes I need to do in any configuration file to increase the resources utilization. -- resource utilization is less but streams getting disconnected  Issue2- round robin is not working when only there is single room when multiple rooms we can see both servers having webrtc processes (how round robin works ???)

the cluster is working fine but I need to make the stream stable without disconnection please provide suggestions

 

0 Kudos
Qiujiao_W_Intel
Employee
877 Views

Raj, please clarify following questions:

1. for nat issue, it only happened in v3.5 and 3.5.1, v3.4 and v3.3 worked well in the same nat environment, right?

2. for ports leak issue in 3.4.1, will it happen with 3.3 release package?

3. For 3.3 on AWS, round-robin in one room does not work for webrtc agent, and round-robin for one room works well in latest 3.5 package. For resource limit, when you say server load, you mean instance B's load or server where webrtc ran? You can modify max_load to a higher value like 0.9 to check. and when disconnection happened, any error happened in MCU logs?

when you add more rooms and connections, please check following configurations:

1. tune your os file descriptor  and network limit configuration

2. modify maxProcesses to a higher value

3. in webrtc_agent, video_agent, audio_agent and other streaming_agent, modify option  network_max_scale to limit the network bandwidth

 

0 Kudos
Naresh_R_1
New Contributor I
877 Views

Hi Qiujiao,

In 3.4 and 3.3 issue is still there but rare scenarios like when there's more traffic to single room, lets say 10 users or 15 users in a single room continusly loggin out and loggin in room gets fails and you get below error

ERROR: publish failed: Timeout to make rpc to 66c3f2c-3313-358b-11c7-1c3dd51ab4f0.publish 

WARN: WebRtcConnection - Bad source SSRC in RTCP feedback packet: 2412642309

When we check server netstat -pltun there are udp and tcp ports related to woogen_webrtc still listening and have to kill those pid related to woogen-webrtc  to work 

We used Intel Xeon 12 core 64 GB Ram high end machine even though we are facing this issues

This issue is reported long back but unfortunatly no latest version fixed this issues, and this is the major critical problem to go to market. 

I believe this issue is with components connecting with rabbitmq service, restarting rabbitmq service also releases ports and room starts working

Thanks

Naresh

0 Kudos
Qiujiao_W_Intel
Employee
877 Views

@Raj, for 3.5.1 nat issue, is there any IPv6 network in your server deployment or client network? 

tcp port release issue was introduced by a third party, we are working on fixing it, stay tuned

0 Kudos
Kiran_Raj_RA
Beginner
877 Views

 Hi Qiujiao,

thank you for your kind response

We tried intel mcu v3.5.1 in both dedicated server and aws instance (3.4.1 is working well in aws, not tried 3.5)(3.5.1 not working in aws)

1. in dedicated server 3.5.1 its working fine with chrome, Mozilla, safari

2. in AWS  Architecture ipv4 is used 3.5.1 it's not even publishing in chrome please find the attached debug level log and browser shows ICE failed.

------------------

I think peoples cannot enter into conference room (room lock issue) in 3.4.1 is due to TCP port release issue. 

if i use mcu 3.4.1v with webrtc round-robin strategy with multiple webrtc agents runs in some number machines(10 aws instances) did the room lock issue will happen ??

Please consider high priority the room locking issue and deployment in aws environment issue.

the above are the major issues we are facing in production. (if any room locking happen nobody can enter in that particular room) 

 

 

0 Kudos
Qiujiao_W_Intel
Employee
877 Views

Thanks, Raj, we will release a new version recently which will fix ICE procedure failed issue, about room lock issue, you can tune server and increase file descriptor number

0 Kudos
Kiran_Raj_RA
Beginner
877 Views

 Hi,

Thank you Qiujiao for the response, when we can expect the delivery of fixed release. 

0 Kudos
Qiujiao_W_Intel
Employee
877 Views

Raj, version 3.5.2 has been released, please try this version. NAT issue has been fixed in this version, please check if it works well in your environment.

0 Kudos
Kiran_Raj_RA
Beginner
877 Views

Thank you Qiujiao for releasing 3.5.2, did 3.5.2 will fix room lock issue and  port issues

0 Kudos
Naresh_R_1
New Contributor I
877 Views

Hi Qiujiao,

Port release issue still exists in v3.5.2

Thanks

Naresh

0 Kudos
Qiujiao_W_Intel
Employee
877 Views

Yes, port release issue is not fixed in v3.5.2, we will fix it in the future release

0 Kudos
Chirravuri__Siva
Beginner
877 Views

Hi Support team

 We have been working with IntelCSwebrtc 3.5.2 version on ubuntu AMI in Amazon cloud behind NAT(Symmetric).

Conference Server  has only stun server configuration setting(webrtc_agent/agent.toml)  while java script client SDK does not send turn candidates either when turn server is configured in the " Woogeen.ConferenceClient.create({})"  function. 

How does the RTP packets punch through in case of symmetric Nat on both sides with out turn bindings?? (before client and before server)

 

Siva

 

 

 

 

 

0 Kudos
Chirravuri__Siva
Beginner
877 Views

Hi Support team

 We have been working with IntelCSwebrtc 3.5.2 version on ubuntu AMI in Amazon cloud behind NAT(Symmetric).

Conference Server  has only stun server configuration setting(webrtc_agent/agent.toml)  while java script client SDK does not send turn candidates either when turn server is configured in the " Woogeen.ConferenceClient.create({})"  function. 

How does the RTP packets punch through in case of symmetric Nat on both sides with out turn bindings?? (before client and before server)

 

Siva

 

 

 

 

 

0 Kudos
Reply