Switch OS Reverse Engeneering

It is really hard to get informations about the proprietary OS that runs on many switches. The vendors don´t give away many informations how it actually works under the hood. The old model of security by obscurity is still applied here. I saw on the 25C3 conference in Berlin the “Cisco IOS attack and defense” talk from Felix FX Lindner that changed my mindset about code quality inside of switch OS completly. Felix FX Linder reverse engeneered the IOS code and showed very detailed how IOS works and wich attack vectors can be leveraged to get control over an IOS based device. Felix is one of the most talented persons in the community when it comes to reverse engeneering and I am very thankful for all the time and effort that he has spend on this project. The talk is about 1 hour and covers a really deep dive into Ciscos IOS code. I learned more about how IOS works from this talk than on all presantations that I have ever seen from Cisco.

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

This talk is from 2008 and was the first of a series of switch OS reverse engeneering projects from FX. The next target was the Huawei VRP OS. The results FX presented on DEFCON 2012. Huawei had a joint a venture with HP and I it looks like that most of the results are also apply for the H3C devices from HP. The myth that Huawei has copied the IOS code was disproved by FX. He found out that the Huawei VRP OS is based on VxWorks. At the end of the talk his devastating summary is “90´s style bugs, 90´s style exploration, 0 operating system hardening … no security advisories..”.

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

Beyond the physical switches FX also reverse engeneered the Cisco Nexus 1000v virtual switch. In the talk “Cisco in the sky with diamonds” FX presented the results of that research at the Signit 2013 conference.The NX-OS based Nexus 1000v is based on a Montavista Linux that runs a 2.6.10 Kernel. FX und Greg found a jailbreak wich they show in the talk and mention that the same jailbreak also works on the physical Nexus devices.

Please accept YouTube cookies to play this video. By accepting you will be accessing content from YouTube, a service provided by an external third party.

YouTube privacy policy

If you accept this notice, your choice will be saved and the page will refresh.

This shows the level of security that is embedded inside of the switches that FX has investigated is very poor. I think very different since I am aware of the resaerch of FX when it comes to protect a switch from getting owned by a hack. It also explains a lot of the bugs that I have expierenced in the past. Hopefully FX and Greg will continue their excellent work in the future.

 

 

 

Posted in All, Blog | Leave a comment

Packet Pushers Podcast Show 250 – How To Document A Network

PPI-Weekly-New-330x330-optI have recently attended to the packet pushers podcast show 250 – How To Document A Network with the packet pushers hosts Ethan @ecbanks and Greg @etherealmind. It was the 3rd time that I have attended to the packet pushers podcast. We had an interesting discussion that gives a pretty good overview about the most important topics regarding to network documentation and documentation tools. We nearly hit the 90 minutes mark. The show can be downloaded here:

http://packetpushers.net/podcast/podcasts/show-250-document-network/

If we missed your favorite documentation tool, feel free to leave a comment.

 

Posted in All, Blog | Leave a comment

Whitebox vs Blackbox

whiteboxvsblackboxAt the moment there are a lot of discussions about Whitebox switches and how it changes the networking industry. Essentially the idea is that you buy your switch hardware and software separately. At the moment most of the network vendors use already merchant silicon like e.g. chips from Broadcom inside their switches. You can also buy the same hardware silicon inside a Whitebox switch.  The main benefit of Whitebox switches is that they are cheaper and you can also use any SFP/QSFP modules from the open market that you like. Additional to the hardware you also need an operating system for your whitebox switches like cumuls linux or an OS toolbox like the Facebook Open Switching System FBOSS. That runs on top of your hardware. Another aspect is that you can change your hard- or software supplier separately and not be dependent on a single vendor. On the other hand with the traditional blackbox appliance model the hard- and software comes as a tested package from a single vendor. For the vendors the challenge is that most of them have the same chips inside their switches. So they need to provide additional goodies on their switches like management, support, features and protocols to convince customers to buy their product.

 

The Networkautobahn View

For me the idea of Wihitebox is not new. In the Firewall space we have had the whitebox model for a long time. And I have been burned many times in situations where something has not worked and the hardware and the software vendor were finger pointing at each other instead of troubleshooting the actual problem. We are facing the same challenges with the whitebox switching. I doubt that whitebox will be revolutionizing the complete network industry. For customers that have the staff and the drive to go the whitebox road it has the obvious benefits of lower costs and independency. But that comes with some restrictions. All the testing and the developing process of new features that is done in a blackbox solution by a vendor is now outsourced to the customer. It can be a benefit to have the ability to add new features to your switches with your own developers. On the other hand not every organisation has developers available to do that job. In the classic blackbox model these task would be provided by the vendor. As always your choice will depend on your needs. I suggest the whitebox model is more attractive for large organisations that have their own developers and enough resources to do extensive testing in their own Lab environment. If you don´t have these resources available the traditional blackbox switches serve your needs better in most cases. I suggest whitebox switching will have a market impact on a specific type of customers like large cloud and datacenter providers. IMHO whitebox switching will be more of a niche product than something that revolutionizes the entire network industry. If the demand is high enough we will definitely see more vendors that add a separately available switching OS product that runs on whitebox switches to their portfolio.

Posted in All, Blog | Leave a comment

Interview with Paul Unbehagen

On the Avaya ATF in Vienna I had the chance to make an interview with Paul Unbehagen, Roger Lapuh and Randy Cross. This trio has been deply involved in the development process of SPB inside Avaya and the IEEE.

Paul_ATF_2015

Network Autobahn:What are typical Topologies that are deployed in SPB networks. What do you have seen in real world deployments. Is leave spine the usual topology or do you also see other solutions.

Paul Unbehagen: So there are not any typical designs. There are a lot networks that are already in place based on SMLT designs or dual hub and spoke that we just simple upgraded to our fabric and it just keeps on running. Often times we found they have been upgraded they added links in different places where they couldn´t do it before because the old SMLT design wouldn´allow it. As our reference architecture we tendenly tworads people more a douple helix looking design recently. Something that would more like a square SMLT stack on each other and going with that comes a more organic growth. SPB don´t care what the topology is. In fact because we don´t have IP Adresses on any of the Interfaces between of the SPB nodes you can redisgn the fabric on the fly. You take one core link and swap it to the another one and the fabric will just adopt for that automatically. We actually do that as a great demo at conferences where the video survalliance guys watching a video and just unplug it. I wouldn´t say there is a typical design because there are so many different out there so many ways.

Roger Lapuh: So maybe I can add there if the 7200 or 8400 as a typically 10Gig /40Gig datacenter switch you probally will see  that you have 7200 as the top of the rack switch . Since we have 6x 40Gig links you can use some of the 40Gig links to interconnect them horizontally for fast east/west and than you have as a aggregation layer a 8400 also with 40Gig connected probably dual homed . So its not a leaf spine , it is a leaf with a horizontal detour.

Paul Unbehagen: We call it distrubuted top of a rack.

Networkautobahn: That is exactly my expierence. One day you see between these 2 Servers I have a lot of traffic. So OK I plug in another direct link between these two switches to add more bandwith.

Paul Unbehagen: So we look at the 7200 as a great example why it has the 6x 40Gig Links at the front. Because the VSP7k that we are using that is using the Fabric Interconnect links , there is actually 8x 40Gig interfaces bundled that wehn we running in the SPB mode they exposed to. In the new 7200 modell we have the ability that we have native 40Gig interfaces , that you can plug into 8k, 9k.. so distrubuted top of the rack becomes more  than a just a VSp7k, it now becomes end-of-row , middle-of-row, top-of-rack all integrated togehterso so it becomes even more flexible detour modell.

Networkautobahn: How do you deploy your L3 instances in a SPB network. You also have planty of choices. In the classic modell you have a centralized design. For example in a classic SMLT design you would have 2 cores with RSMLT. In a SPB network you can have L3 also in the access. What do you have seen here. Is it the same answer than we had in the topology discussion ?

Paul Unbehagen: Yes , the answer is very similar. It all depends on the enviroment and the emotional attachments.

Randy Cross: And the Relegion the people doing it…

Paul Unbehagen: So by relegion the question is do you route or switch ad the edge kind of conversation. The point with the fabric was no longer make it a religious debate. Make it do what you need to do where you need to do and mary the two togehter. Putting the two worlds togheter. The idea that you can switch or route ad one point, so people did some routing and now we get more mobility in the Layer 2. When you are in the fabric you are not in a 2 deminsional world you are in a 3 dimensional world. So you can create a layer 2 anywhere you want underneeth the layer 3. You are not bound by that anymore. You can put for example in the datacenter a VRF right on the top-of-the-rack like with the VSP 7200 or 8200. You can also put the same 8200 put in the distribution or core and have a 8400 in the closet doing layer 2. You can also have layer 2 that starts in the 8400 span multiple closets for example for wireless LAN that is important. We call it unified networking. You probally have heard me talking about that in previous presantations the problem set in wireless LAN is identically than in the datacenters. VMs and moving around is the same problem set as moving around with these devices. We not putting you on the native LAN in the cmapuses just we do in the datacenter. The LAN stretch that we do in the datacenter makes perfect sense when we talking about the campus as well. It gets more flexibility. Where you do the routing is depending on the application type. You might wanna do when the device you are holding in your hand is on the same subnet as your PC in the cube in your office. When you pick up your tablet you still roaming on the same subnet, using the same DNS and DHCP so it becomes a simpler design but more robust. When we are not going back to a wirelss controller in the datacenter , your are not trying to manage tunnels over a real network going down to the speed of the tunnel. You using native switching to do native forwarding. So taht means you get more flexibilty where you put your routing.  You are doing it more 3 dimensional, here is where I want my routing and here is my switching stretching benteh.

Roger Lapuh: From a product perspective we will map this basically so far the VSP7000 is Layer2 only. So you are forced into the spine beeing Layer3 and edge beeing Layer2.With the new VSP7200 coming we will give you the freedom to really push routing to the edge.It will have VRFs all the capabilities that you knew from the VSP4k or VSP8400 right at that switch. So that you can have a top-of-the-rack with the VRFs right there.

Networkautobahn: What can you tell us about the new ONA devices that Avaya has recently introduced and how they are connected to a SPB Fabric.

Paul Unbehagen: So an ONA is kind of interesting, because it has multiple use cases depending on your need. For some people it is about the security, for others the automation or simplicity. For example the ONA uses a technology that we call Fabric Attach, wich we have actually taken to the IEEE and will become 802.1qsg. What it is basically it is an extension to LLDP to bring the capability to an device to say I want to join VSN 53. It might be your medical or secure network or what ever you are using it for. It is not limited to the ONAs. This allows thinks like video surveillance cameras they start to embed the FA technology. For devices they don´t have the ability to integrate the FA technology quickly like for example an MRI, it is an multi million dollar device that don´t change very often. For this the ONA allows you to bring the Fabric Attach concepts to devices that normally wouldn´t get it. In the Fabric Attach mode, because there are two modes Fabric Attach and Fabric Extend. Allows you that an MRI is automaticaly is attach to the medical devices VSN and I wanna to make sure that the only thing that is allowed to communicate through between this MRI is the Server-to-MRI and MRI-to-Server and nothing else. For healthcare this is a huge benefit. This apply to healthcare, education, stock exchanges…

Randy Cross: PCI and anything where you need device isolation manufacturing.

Networkautobahn: All you need for deploying that is a routed connection ?

Paul Unberhagen: In the Fabric Attach mode all you really need is one of our switches to plug into. You don´t even need a fabric. You can just take a ERS4800 and have it automatically configure the VLAN attachments. If you have a full SPB fabric it is getting more powerfull, because you can say connect me to the right Layer2 VSN and now it attaches you where ever you need. The ONA itself is very powerfull, because what it is running on is openVSwitch. And the same openVSwitch we where demoed at VMWare last year. The 2.4 release of openVSwitch has Fabric Attach embedded as well. So you can run this in your datacenter and have Fabric Attach going to your 7200 in the future.

Networkautobahn: Can the ONA do encryption ?

Paul Unbehagen: Stay tuned not yet. The other interesting aspect is asset tracking. Suddenly you know where everything is in your network and if you ever need to move a device to another room, no one has to be involved in that change. Just unplug the ONA roll it to the next room and plug it back in, and it automatically attached.

Roger Lapuh: I just want to touch how it is manged. Basically it doesn´t have a CLI or is manually configured.  It is dumb , you plug it in and it gets all its configurations from a centralized controller. It is really leveraging the SDN approach. So you have a central controllor that have some rules how these particular ONAs connecting, if someone steels it all configuration is lost and it can´t be used anymore. It is really a tie in between the costumer infrastructure controller and the ONA, that is happening while you connect the ONA to the network.

Paul Unbehagen: Also if someone tries to hack it, because there is no CLI or even a console port on it. If they are trying to start manipulating the software on it , it will brick itself because it is a secure boot device.So a lot of thoughts was putting into it. trying to make sure it is a secure environment, cause we don’t want someone stealing it from one place and trying to take it to another place to hack into our virtual network. So in this case we are trying to make sure that is not only provide segmentation and security, it also makes it easier for you to sleep at night.

Networkautobahn: SPB is now getting connected to SDN solutions. Is this the future of SPB? How is it connecetd to OpenFlow ?

Paul Unbehagen: So the communication you are talking about to the controller looks like this the ONA talks to the controller via OpenFlow. When you plug it in the ONA downloads a rule set via OpenFlow that triggers a Fabric Attach massage going up to the first SPB switch and the SPB switch provides the needed VSN. It is a combination of technologys in the right way versus simply there is only one way. It allows the right mixture of tools to make the solution work.

Networkautobahn: At the moment everybody is talking about SDN , for me SPB has solved most of the problems that SDN is trying to address.

Paul Unbehagen: That is our SDN FX story. The SPB Protocol has solved already what everybody else is trying to solve with SDN. We took a lot of time to talk to our customers what is it really that you think it is you need SDN for. It really comes down to we need to automate the edge and the connections. That is really what our SDN is.

Posted in All, Avaya, Blog | 4 Comments

10 Days of Troubleshooting

PaperCamera2015-06-20-00-16-18The last 10 days I spent with troubleshooting. Sometimes you hit a problem and until it is fixed you will not have a lot of sleep and coffee is one of your best friends. So my little war story starts with a planed upgrade in one of our datacenters. We added 30 additional switches to our SPB based fabric. The actual planed job has worked out as planed. We started at Friday and nearly finished the job with adding the new switches to the network on Saturday evening. We left with a good feeling and had only 4 Switches left for the Sunday wich looked like a short day. During the night several ERS4800 stacks that where already deployed 12 month erlier in that network lost their ISIS adjancencies and went offline. All the new added single ERS4800 switches worked without any issues. On the console of the ERS4800 stacks that have been gone offline we have seen immediatly that the CPU utilization was at nearly 100%. With 100% CPU utilization the ERS4800 stacks didn´t send the ISIS hello packets and the adjancency went down. On the start of the problem we had also VLACP configured on the uplinks and here we had the same problem that the cpu was to busy to send VLACP packets and the connected core switch shuts down the ports because it didn´t receive VLACP packets from the connected ERS4k anymore. Disabling VLACP didn´t helped very much because we start running in the same problem with the ISIS hello mechanism. The workaround that helped to get the connected switches back online was to split up the stacks into single units and reducing the number of adjancencies per stack. That was a very time consuming task especially with large stacks. It was strange that the stacks that have worked for 12 month without any problems had shown this connection loss after adding additional devices to a different part of the network.

Bug describtion

The Avaya support was able to find the root cause of the problem. On the ERS4k SW 5.7.x or 5.8.x the process when a ISIS adjancencie is formed up looks like this. The ERS4k makes a lookup on all the interfaces than prgramm the ASICs per I-SID. So the Interface lookup is proceeded for every I-SID. We have seen the problem with devices that run ~140 I-SIDs. Here it also depends how many other devices have that I-SID also configured. In the backround there is a path calcultaion for all the devices that have also that particular I-SID configured running that causes also high CPU load. For example the ERS4k with 140 configured I-SIDs has also to do the Interface lookup 140 times. In my expierence the ERS4k runs stable in the network until you reach a certain breaking point where the path and interface calculation CPU spike is longer than the ISIS timer. When you reach that point you end in a situation where the CPU runs endless in 100%. When you loose the adjancency the deconfiguration also produce the same CPU spike when it pulls back the I-SID assignements. Avaya was able to rpvode a bugfix release. Basicly the bugfix makes that the interface lookup is done only one time regardingless how many I-SIDs are configured. We tested this bugfix release and could see that there is now only a very short CPU spikes and that solves the problem. The same Bugfix is in the VSP7k 10.3.3 release already in place here you could run into the same problem.

Here is Link to the Release notes of the fixed SW 5.8.1.301s: https://downloads.avaya.com/css/P8/documents/101012182

The Aftermath

with every network outage you loose a lot of trust. When the network has run stable for long time the customers expecting that it will run all the time without any interruption. When you have to scale your network on the fly you run sometimes into problems. Often that ends in the “it´s always the network fault” discussion. On the day we successfully updated all switches to the fixed software release and solved the problem, a new problem has come up on our centralized storage system. That was a good reminder that nearly all IT systems have hidden bugs inside. Finger pointing doesn´t help anyone we are all in the same boat. I have to say that all the envolved people in this troubleshooting hunt have done an amazing job. Everybody has worked after hours and did everything to mange the crysis as good as possible. Everybody from the different IT departments, the management and the Avaya Support has put serious efforts into fixing the problem as fast as possible. Thanks to everybody that have been involved.

Posted in All, Avaya, Blog | 4 Comments

Passive Fibre Optic Infrstructure with Cross Connects

crossconnect_2I was attending recently to a conference with all the main vendors from the passive fibre optic industry. I normally deal with routers and switches and haven´t looked for a long time what is going on at the passive infrastructure side. In the past most of the passive patchpanels had 12 or 24 ports and the density was usally not so high as we had it on the active switch side. A modular switch for example with 10x 48 Ports has up to 480 Ports when it is fully loaded. You really needed here a proper cable manegemnt or it ends in a cabeling chaos. In a greenfield installation you can do a nice and clean looking rack with a propper cable managent. In my expierence 10 years later after several changes you don´t have a nice looking Rack anymore. Over time where added more and more cables and if that was done in a hurry it is not looking very nice or organiced. It is really hard to keep up a clean cabling over a long time period when you have in lot of cases not the time to spent the love that is need in every patch that has to be rolled out. And after 10 years it looks not very pretty. The people in the ISO standard organization have looked deep into that problem of cable manegemnt and how it is possible to prevent the chaos. The first very interesting fact here is that one of the main factors for the choas are the hotspots in a Rack. Most times people use the shortest possible cable lenght to connect two ports with each other. That causes like traffic jams on the Autobahn some spots in the Rack that are overloaded. To crossconnect_1prevent that it is recommanded to use always the same cable length for all patches. That eleminates the hotspots and make the patchmanagement more easy to maintain. And a solid patchmanegemnt is needed for Cross Connect installations. In a Cross Connect Rack you can have depending on the vendor up to 1500+ fibres in one Rack. When this Rack is fully patched with a standard cable lenght of 6m you end with 4600 meters of patch cable in this rack. I tested removing and adding cables to a cross connect rack and was very surprised that it was still possible to remove and add cables fast and easy. For a new Datacenter deployment the cross connect brings a lot of nice benfits and is something that it is worth to take a look at it.

Posted in All, Blog | Leave a comment

Avaya ATF 2015 WrapUp

DSC_0137I have attended the Avaya ATF in Vienna. It was my first ATF btw. I will try to provide a summary of the news that Avaya has shown at the conference. First I have to say that I really liked the atmosphere at the ATF. It is one of the events, where you can still get in contact with the right people. Some events have become so large that you can not connect to the people that are responsible for a new product or technology. I was really glad to find out that this was at the ATF not the case. It was also very interesting for me to talk to other customers and get feedabck about their point of view of several topics.
In the past the ATF was covering data networking only , they have changed that. Now you find also voice and video related content on the ATF. Anyway I have to admit that I have only attended to BreakOut Session with Network related toppics. So lets get into the details from Avaya.

Hardware:DSC_0147
Avaya has shown 3 new hardware platforms at the ATF. The ERS5900 Switch family, the VSP7200 and the ONA Adapter. All 3 platforms where announced erlier, but here at the ATF I suppoted all boxes and could play a littlebit with the devices.
The replacement for the ERS5500 and 5600 is the new ERS5900 series. It runs the BOSS image and will come out with 4 different hardware versions at the introducing. You can get more details about the ERS5900 here:
http://networkautobahn.com/2015/03/03/news-avaya-ers5900-switches/
The VSP 7200 is a 48 Port 10Gigabit Switch with 6x 40Gig QSFP Slots. It is available with SFP+ slots or as a fixed RJ45 copper version. It runs the VOSS release.
The ONA adaper is a small device that is open VSwitch based. It can be levearged to connect a remote site to an SPB Fabric and other interesting usecases. More details about the ONA can be found here:
http://networkautobahn.com/2015/02/26/news-avaya-announced-their-ona-open-network-adapter/

SDN:
The SDN strategy that Avaya has presentet was very different than that what you get from most of the other compeditors. The main goal is to solve problems. Avaya is saying that most of the problems that usally are tried to solve with a SDN solution are already solved with SPB based Fabric approach from Avaya. Instead of solving this problems again with a SDN solution Avaya is leveraging the benfits of their Fabric and extending it. To achive that Avaya has been involved in several projects like OpenFlow, OpenDayLight, KVM and Open vSwitch and contributed code like the Fabric Attach feature so that a Avaya Fabric can connect to these platforms. To control all that Avaya has introduced their Fabric Orchestrator SDN Fx product. The SDN Fx architecture will include Fabric Orchestration, RESTful APIs, SDK, SDN Apps, OpenStack and a Discovery Engine. Besides the SDN parts you will also find here all the other Avaya Management products like VPFM, COM and the new Fabric Attach Tunnel Manager. In this times of Software defined everything this will come out surprisingly as a hardware aplliance wich supports 25 cuncurrent Sessions at the beginning.

Certification:
Avaya will introduce their new top certification ACE-FX.  The certification is announced  as a 4 ½-day intensive hands-on workshop and exam held at Avaya Networking Lab facilities. To make the exam you need to hold other Avaya exams or high level certification from other vendors. It is required to understand the complete stack of technology to complete the exam. To get the Platinum-Partner status it will become mandotory to have ACE-FX certified Engineers. Another benefit of the new certification will be that a ACE-FX certified Engineer will get a higher priororty in the support case que.

The Network Autobahn view:

Avaya has steadily improved their SPB based Fabric technology. Sometimes you don´t need to inevnt the wheel again. It makes more sense to leverage an exsisting technology and evolve it. To combine the Fabric with SDN creates a lot of possibilties. I also like that Avaya has contributed their code to several open source projects. Now is in open vSwitch the fabric attach feature available. The next step is that the Fabric Attach will becomes an open IEEE standard. I also see a lot of potential in that you can spread your fabric over an IP network. Instaed of managing a lot of islands sepratly you now have all the benefits of one large fabric. That makes the operating more easy and solves most of the usual problems. I got the impression that everyone at Avaya is really passionate about their products.

I have tried to convince AVAYA to make an emulator publicly available for their switches. Hopefully Avaya is considering to make their VSP emulator public available. I think that would be a huge benefit for all customers.

 

Posted in All, Avaya, Blog | 2 Comments

Avaya ATF 2015 Vienna

ATF_viennaI will be at the Avaya ATF Forum 5 – 8  May 2015 in Vienna.

It is my first ATF , so I am curious what Avaya will show. I hope it will be a deep dive into the technology with a lot of SPB related content. The Agenda looks very promissing.

If you also attending and would like to have a chat, drop me a massage.

See you in Vienna.

https://news.avaya.com/eu-avaya-technology-forum-2015-index

Posted in All, Avaya, Blog | Leave a comment