This is one stop global knowledge base where you can learn about all the products, solutions and support features.
The Pure Storage FlashArray is consistently used with VMware environments and there's a good chance that Support will have cases where they need to troubleshoot and diagnose how the FlashArray interacts with the VMware environment.
During live troubleshooting, both the customer and Pure Support can look at the logs live as needed, however for an investigation into events that have already occurred VMware Support Logs will need to be provided to Support to move that investigation forward.
VMware has a detailed KB on how to Gather vCenter and ESXi Logs. Please review VMware's documentation.
With vCenter 6.7 release, there has been more adoption of the HTML5 Client now. There is no export option in the monitor tab, so the process is a little different.
Here is an example of the HTML5 Client.
Right Click on the vCenter and Select Export System Logs |
---|
Check the box to include vCenter logs. If the Support case is related to vVols, SRM or Plugins, this is very important to gather. |
---|
Then you can export the logs. The default selections for the hosts are usually enough for Pure Support. |
---|
Keep in mind, the HTML5 Log exports will be named a little different then the from the flash client. |
---|
Then here is an example of the Flash client and the monitor tab.
Navigate to the vCenter that the logs are being gathered for.
Go to the Monitor tab and click on system logs. Then click on export system logs. |
---|
Check the box to include vCenter logs. If the Support case is related to vVols, SRM or Plugins, this is very important to gather. |
---|
Then click Finish. Another window will pop up asking where to save the compressed file. |
---|
Once the logs are downloaded, they can be uploaded via Pure1 for Support.
There is a Uploading Files to a Support Case KB that outlines that process.
The SRA's logs are located in:
/var/log/vmware/srm/SRAs/sha256{characterSpamHere}
This directory should have the logs referenced below.
The SRM's logs are located in:
/var/log/vmware/srm/
The SRA 's logs are located in:
%PROGRAMDATA%\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage
Each invocation of the SRA produces one log file. Sort by Date Modified to see the commands executed in chronological order.
The SRM 's logs are located in :
%PROGRAMDATA%VMware\VMware vCenter Site Recovery Manager\Logs
Look for vmware-dr-##.log files. The file with the largest ## is the most recent. The SRM logs are useful for diagnosing problems when (a) The SRA responded correctly, but SRM still failed an operation, and (b) The SRA crashed on launch before being able to log anything. Before collecting SRM logs, be sure to quit SRM and wait a few seconds for all the logs to be flushed to disk.
The SRA installer's logs are located in:
%PROGRAMDATA%\PureStorage\PureSRAInstaller.log
To verify that the SRA has been installed correctly, do the following:
HKEY_LOCAL_MACHINE\SOFTWARE\VMware, Inc.\VMware vCenter Site Recovery Manager\InstallPath
in the Registry. It is usually
C:\Program Files\VMware\VMware vCenter Site Recovery Manager\bin
.
command.pl PureSRA.exe PureSRA.pdb PureSRA.exe.config PureStorage.Rest.dll PureStorage.Rest.pdb Newtonsoft.Json
%windir%\Microsoft.NET\Framework\v3.5
The SRA log starts by logging the SRA version and the info about the environment it runs in. The SRA expects to run as an admin, and as a 64-bit process. It will not work correctly otherwise. Look for something like to confirm this is the case:
[02/12/2015 11:03:30,Logging session for discoverArrays,V] Process is 64-Bit. [02/12/2015 11:03:30,Logging session for discoverArrays,V] Running as the administrator.
The SRA logs the input from SRM. Look for "Received input:" followed by an XML string near the top of the log file. In the XML string, there are usually two Connection nodes, with the id "localArray" and and "peerArray". They are the pair of array the operation is being applied on. These info are entered by the user when configuring the SRA from inside SRM. Make sure they are actually the arrays you are using for SRM. Example:
<Connection id="localArray"> <Addresses> <Address id="spA">vss-purity-vm1.dev.purestorage.com</Address> </Addresses> ... </Connection>
<Connection id="peerArray"> <Addresses> <Address id="spA">vss-purity-vm2.dev.purestorage.com</Address> </Addresses> ... </Connection>
Each entry in the SRA log is associated with a verbosity level. Search for ", E ]" and ", W ]" in the log file to see the logged E rrors and W arnings, respectively. They are usually indicative of what went wrong. For example, the entries below indicate that the SRA could not connect to an array:
[02/11/2015 09:40:56,SRMCommandHandlerBase.cs:ConnectToInputAr<wbr/>rays,W] Connection failed to FlashArray at 10.66.50.90 using connection localArray [02/11/2015 09:40:56,SRMCommandHandlerBase.cs:ConnectToInputAr<wbr/>rays,E] "PureRestException: HttpStatusCode = 'BadRequest', RestErrorCode = 'InvalidVersion', Details = '', InnerException = ''"
If the entire operation failed, the SRA will output an error. Look for "Setting output:" followed by some XML string. You should see an Error node with an error ID, such as:
<Error code="1004"> <purestorage:PureExceptionMessage>...</purestorage<wbr/>:PureExceptionMessage> <purestorage:LogFile>...</purestorage:LogFile> </Error>
The meaning of the error codes are listed below. For example, 1004 stands for "array unreachable".
WarningSyncInProgress = 500, // Defined by VMware ErrorUnhandledException = 1001, ErrorUnknownCommand = 1002, ErrorPureException = 1003, ErrorArrayUnreachable = 1004, ErrorArrayUnauthorized = 1005, ErrorArrayIdNotAvailable = 1006, ErrorArrayIDMissing = 1007, ErrorBadArrayPair = 1008, ErrorVolumeNotInPGroup = 1009, ErrorCannotFindSyncStatus = 1010, ErrorCannotFindSnapshot = 1011, ErrorCannotFindVolume = 1012, ErrorTestFailoverStartInProgress = 1013, ErrorVolumeConnectionFailed = 1014, WarningCannotFindPgroup = 1015, ErrorCannotCreatePgroup = 1016, WarningDeviceAlreadyFailedOver = 1017, WarningPrepareFailoverInProgress = 1018, ErrorArrayInsufficientPermissions = 1019, WarningHostConnectionFailed = 1020, ErrorCannotCreateVolume = 1021, ErrorCannotDisconnectVolume = 1022, ErrorCannotRenameVolume = 1023, ErrorVolumeNotDisconnected = 1024, ErrorCannotDeleteVolume = 1025, ErrorCannotSnapshotPGroup = 1026, ErrorEmptyOrMissingDeviceID = 1027, WarningAlreadyPerformedPrepareRestoreReplication = 1028, WarningAlreadyPerformedRestoreReplication = 1029, WarningAlreadyPerformedPrepareReverseReplication = 1030
The "PureExceptionMessage" portion of the error will contain more specific information about why the operation failed. Examples are under the subheadings for specific errors below.
Additionally, the SRA logs the HTTP requests (URL only) and their response codes. Look for "Rest Library transcript:" in the log. In the HTTP transcript that follows, look for "PureStorage.Rest Error:". Note that many HTTP errors are benign and expected (e.g. we test for the existence of a volume by asking the array about it; if the array responds with a does-not-exist error, we know it doesn't exist), so view errors in the HTTP transcript in the context of other clues in the log.
Example error:
[01/04/2018 11:54:01,DiscoverArrays.cs:ProcessCommand,V] Exiting Setting output: <?xml version="1.0" encoding="utf-8"?> <Response xmlns="http://www.vmware.com/srm/sra/v2" xmlns:purestorage="http://www.purestorage.com/sra"> <Error code="1004"> <purestorage:PureExceptionMessage>The remote server returned an error: (400) Bad Request. Message from Purity='ctx:,msg:invalid credentials'</purestorage:PureExceptionMessage> <purestorage:LogFile>C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage\discoverArrays_2018-01-04-11-53-58-443789-6607c4a5-ed27-4f4a-8d3b-46c8db303e85.log</purestorage:LogFile> </Error> </Response>
This usually means the array managers are configured incorrectly. If it is a "one to many" or "many to one" configuration, ensure that the Pure FlashArray username and password are the same for each array.
Example error:
[01/04/2018 11:54:01,DiscoverArrays.cs:ProcessCommand,V] Exiting Setting output: <?xml version="1.0" encoding="utf-8"?> <Response xmlns="http://www.vmware.com/srm/sra/v2" xmlns:purestorage="http://www.purestorage.com/sra"> <Error code="1004"> <purestorage:PureExceptionMessage>The remote server returned an error: (400) Bad Request. Message from Purity='ctx:,msg:invalid credentials'</purestorage:PureExceptionMessage> <purestorage:LogFile>C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage\discoverArrays_2018-01-04-11-53-58-443789-6607c4a5-ed27-4f4a-8d3b-46c8db303e85.log</purestorage:LogFile> </Error> </Response>
This usually means the array managers are configured incorrectly. If it is a "one to many" or "many to one" configuration, ensure that the Pure FlashArray username and password are the same for each array.
Example error:
[01/04/2018 11:54:01,DiscoverArrays.cs:ProcessCommand,V] Exiting Setting output: <?xml version="1.0" encoding="utf-8"?> <Response xmlns="http://www.vmware.com/srm/sra/v2" xmlns:purestorage="http://www.purestorage.com/sra"> <Error code="1004"> <purestorage:PureExceptionMessage>The remote server returned an error: (400) Bad Request. Message from Purity='ctx:,msg:invalid credentials'</purestorage:PureExceptionMessage> <purestorage:LogFile>C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage\discoverArrays_2018-01-04-11-53-58-443789-6607c4a5-ed27-4f4a-8d3b-46c8db303e85.log</purestorage:LogFile> </Error> </Response>
This usually means the array managers are configured incorrectly. If it is a "one to many" or "many to one" configuration, ensure that the Pure FlashArray username and password are the same for each array.
Example error:
[01/04/2018 11:54:01,DiscoverArrays.cs:ProcessCommand,V] Exiting Setting output: <?xml version="1.0" encoding="utf-8"?> <Response xmlns="http://www.vmware.com/srm/sra/v2" xmlns:purestorage="http://www.purestorage.com/sra"> <Error code="1004"> <purestorage:PureExceptionMessage>The remote server returned an error: (400) Bad Request. Message from Purity='ctx:,msg:invalid credentials'</purestorage:PureExceptionMessage> <purestorage:LogFile>C:\ProgramData\VMware\VMware vCenter Site Recovery Manager\Logs\SRAs\purestorage\discoverArrays_2018-01-04-11-53-58-443789-6607c4a5-ed27-4f4a-8d3b-46c8db303e85.log</purestorage:LogFile> </Error> </Response>
This usually means the array managers are configured incorrectly. If it is a "one to many" or "many to one" configuration, ensure that the Pure FlashArray username and password are the same for each array.
If the SRA does not hear a response from the array with 60 seconds for a REST call, it times out (indicated by verbiage about "timed out" in the logs).
To change the HTTP timeout value from the default value of 60 seconds, do the following (on all machines where the SRA is installed):
HKEY_LOCAL_MACHINE\SOFTWARE\PureStorage\SRA
(or navigate to it if it already exists).
The change will take effect the next time the SRA is invoked.
By default, the SRA prioritizes host group connections when asked by SRM to connect to hosts (e.g. if a hostgroup HG contains a host H, the SRA will connect to HG when asked to connect to H). Most users should use this behavior.
However, if the user wishes to disable this behavior (
i.e. only connect to hosts on failover)
, the can add a
DWORD
Value (named "DisableHostGroupConnectionOnFailover") under the registry key
HKEY_LOCAL_MACHINE\SOFTWARE\PureStorage\SRA
, and set its value to 1.
One useful tool of debugging problems is to use Fiddler. You will need to follow Decrypting HTTPS-protected traffic to set up debugging for HTTPS traffic. Repeat the failed SRM operation with Fiddler running to see the HTTP traffic. You can save Fiddler trace to file and give to the dev team for further debugging.
Another noteworthy reminder is that the user needs to rescan for the SRA after upgrading it (it's covered in the manual accompanying the SRA), or they may see errors.
Additionally, it might be useful to enable a higher level of logging if what you have is not providing sufficient information. Please note that greater detail in logging can fill up whatever the logs are writing to, so you need to be really careful when enabling this and should only enable it for short periods of time while the customer is reproducing an issue and then disable it immediately after. More details can be found in this VMware KB.
When you are using the vCenter Pure plug-in you are able to view and manage the GUI within the vSphere Web Client. Each browser has a different way to handle the SSL Certificate authentication. We have a known issue with mismatched addresses that we plan to fix in the long term, in the meantime you can find the workarounds below.
Note: Certificates used with a FlashArray must be PEM formatted ( Base64 encoded).
When logging into the vCenter Web GUI on IE you may see the following:
In order to use the vSphere Plug-In you will need to perform the following:
Do not log in, click on "Certificate Error > View Certificates"
Once the certificate is installed, you should be able to see the Pure GUI in the vCenter Plug-in.
When you log into the vSphere GUI you will see the following:
To fix this you will need to log into the array directly first & by pass the certificate warning:
Once you do this, you will be able to reload the vSphere GUI and see the Pure GUI in the plug-in.
When logging into the vCenter Web GUI you will get the following error:
To fix this, log directly into the Pure GUI and add it as an exception:
Once you've done this you're all set as this will add it as a permanent exception.
When logging into the vCenter Web GUI on IE you may see the following:
In order to use the vSphere Plug-In you will need to perform the following:
Do not log in, click on "Certificate Error > View Certificates"
Once the certificate is installed, you should be able to see the Pure GUI in the vCenter Plug-in.
Do not log in, click on "Certificate Error > View Certificates"
Once the certificate is installed, you should be able to see the Pure GUI in the vCenter Plug-in.
When you log into the vSphere GUI you will see the following:
To fix this you will need to log into the array directly first & by pass the certificate warning:
Once you do this, you will be able to reload the vSphere GUI and see the Pure GUI in the plug-in.
When logging into the vCenter Web GUI you will get the following error:
To fix this, log directly into the Pure GUI and add it as an exception:
Once you've done this you're all set as this will add it as a permanent exception.
If VMware is not configured as per best practice expectations (ATS Enabled) then we may see SCSI-2 Reservations in our logs. This is how you can check to see if that's happening:
quelyn@i-9000a448:/logs/del-valle.k12.tx.us/dvisd-pure01-ct1/2015_10_27$ tgrep -c 'vol.pr_cache inserting registration' core* core.log-2015102700.gz:1875 core.log-2015102701.gz:1798 core.log-2015102702.gz:1827 core.log-2015102703.gz:1817 core.log-2015102704.gz:1860 core.log-2015102705.gz:1812 core.log-2015102706.gz:1818 core.log-2015102707.gz:2577 core.log-2015102708.gz:8181 core.log-2015102709.gz:15131 core.log-2015102710.gz:21826 core.log-2015102711.gz:19140 core.log-2015102712.gz:12044 core.log-2015102713.gz:13451 core.log-2015102714.gz:22995 core.log-2015102715.gz:33136 core.log-2015102716.gz:18587 core.log-2015102717.gz:5900 core.log-2015102718.gz:7324 core.log-2015102719.gz:2541 core.log-2015102720.gz:2213 core.log-2015102721.gz:1850 core.log-2015102722.gz:1807 core.log-2015102723.gz:1851
quelyn@i-9000a448:/logs/del-valle.k12.tx.us/dvisd-pure01-ct1/2015_10_27$ tgrep 'vol.pr_cache inserting registration' core* core.log-2015102700.gz:Oct 26 23:18:51.394 7FB1ACCF3700 I vol.pr_cache inserting registration, seq 3097137672 vol 69674 i_t 20000025B5AA000E-spc2-0 res_type 15 core.log-2015102700.gz:Oct 26 23:18:51.431 7FB1AD5F5700 I vol.pr_cache inserting registration, seq 3097137673 vol 69673 i_t 20000025B5BB000E-spc2-0 res_type 15 core.log-2015102700.gz:Oct 26 23:18:51.457 7FB1AE378700 I vol.pr_cache inserting registration, seq 3097137674 vol 69662 i_t 20000025B5AA000E-spc2-0 res_type 15 core.log-2015102700.gz:Oct 26 23:18:51.483 7FB1AE378700 I vol.pr_cache inserting registration, seq 3097137675 vol 69661 i_t 20000025B5AA000E-spc2-0 res_type 15 core.log-2015102700.gz:Oct 26 23:18:51.503 7FB1A73FC700 I vol.pr_cache inserting registration, seq 3097137676 vol 69676 i_t 20000025B5AA000E-spc2-0 res_type 15 core.log-2015102700.gz:Oct 26 23:18:51.522 7FB1AF7FD700 I vol.pr_cache inserting registration, seq 3097137677 vol 69667 i_t 20000025B5BB000E-spc2-0 res_type 15
quelyn@i-9000a448:/logs/del-valle.k12.tx.us/dvisd-pure01-ct1/2015_10_27$ pslun Volume Name pslun Name -------------------------------- ---------- PURE-STR-LUN01 pslun69648 PURE-STR-LUN02 pslun69660 PURE-STR-LUN03 pslun69661 PURE-STR-LUN04 pslun69662 PURE-STR-LUN05 pslun69664 PURE-STR-LUN06 pslun69665 PURE-STR-LUN07 pslun69666 PURE-STR-LUN08 pslun69667 PURE-STR-LUN09 pslun69668 PUR-STR-LUN10 pslun69669 PUR-STR-LUN11 pslun69670 PURE-STR-LUN12 pslun69671 PURE-STR-LUN13 pslun69672 PURE-STR-LUN14 pslun69673 PURE-STR-LUN15 pslun69674 PURE-STR-LUN16 pslun69675 PURE-STR-LUN17 pslun69676 PURE-STR-LUN18 pslun69677 PURE-STR-LUN19 pslun69678
/home/jhop/python/Mr_VMware.py
Since we only care about datastores (not Raw Device Maps (RDMs)) on Pure Storage we will find our applicable LUNs in the "esxcfg-scsidevs -m.txt" file under the "commands" folder in a VMware Support Bundle. Below is an example of what a line from there will look like:
There are several things that we want to identify from this output; the first is the "NAA Identifier". This is important because anything starting with "naa.624a937" is a Pure Storage LUN. Once we have a Pure Storage LUN we then want to take note of the "VMFS UUID" number (i.e. 53c80075-7ddcc5ba-7d03-0025b5000080 ). The reason why we focus on this instead of the "User-Friendly Name" is because the customers can choose any name they want in that option. If we choose the VMFS UUID then we are guaranteed to know we are referring to a Pure Storage LUN since that is a uniquely generated ID that the vCenter Server assigns to individual LUNs.
Once we have this information the next step is to search for the "vmkfstools" text file that contains the File System information on this device; this will also be found in the "commands" folder you already reside in. An example of what the text file will look like is as follows:
vmkfstools_-P--v-10-vmfsvolumes 53c80075-7ddcc5ba-7d03-0025b5000080 .txt
Notice above our "VMFS UUID" is contained in the file name (in red). We can now search this file for the "Mode" it is running in. An example of what this line will look like, if configured properly, is as follows:
Mode: public ATS-only
If the datastore is not configured properly it will look as follows:
Mode: public
If the datastore is showing a "public" mode then we know that this datastore is misconfigured and we'll be receiving an excessive amount of SCSI-2 Reservations from the ESXi Hosts. This means that locking tasks are not being offloaded to the FlashArray.
Obviously if the customer has a lot of LUNs this process above can take a while, so it is best to script this. I have listed below a simple one liner that will do this for you if you would like to use this instead of going through each LUN one-by-one:
grep "naa.624a937" esxcfg-scsidevs_-m.txt | awk '{print $3}' > Pure-LUNs.txt;while read f;do cat vmkfs*$f.txt |grep -e "Mode:" -e "naa.624a937";echo;done < Pure-LUNs.txt
NOTE: This command is able to be copied & pasted then used on any ESXi Host that is 5.0 and higher, as long as you are in the "commands" folder of the ESXi Host you want to verify.
Since we only care about datastores (not Raw Device Maps (RDMs)) on Pure Storage we will find our applicable LUNs in the "esxcfg-scsidevs -m.txt" file under the "commands" folder in a VMware Support Bundle. Below is an example of what a line from there will look like:
There are several things that we want to identify from this output; the first is the "NAA Identifier". This is important because anything starting with "naa.624a937" is a Pure Storage LUN. Once we have a Pure Storage LUN we then want to take note of the "VMFS UUID" number (i.e. 53c80075-7ddcc5ba-7d03-0025b5000080 ). The reason why we focus on this instead of the "User-Friendly Name" is because the customers can choose any name they want in that option. If we choose the VMFS UUID then we are guaranteed to know we are referring to a Pure Storage LUN since that is a uniquely generated ID that the vCenter Server assigns to individual LUNs.
Once we have this information the next step is to search for the "vmkfstools" text file that contains the File System information on this device; this will also be found in the "commands" folder you already reside in. An example of what the text file will look like is as follows:
vmkfstools_-P--v-10-vmfsvolumes 53c80075-7ddcc5ba-7d03-0025b5000080 .txt
Notice above our "VMFS UUID" is contained in the file name (in red). We can now search this file for the "Mode" it is running in. An example of what this line will look like, if configured properly, is as follows:
Mode: public ATS-only
If the datastore is not configured properly it will look as follows:
Mode: public
If the datastore is showing a "public" mode then we know that this datastore is misconfigured and we'll be receiving an excessive amount of SCSI-2 Reservations from the ESXi Hosts. This means that locking tasks are not being offloaded to the FlashArray.
Obviously if the customer has a lot of LUNs this process above can take a while, so it is best to script this. I have listed below a simple one liner that will do this for you if you would like to use this instead of going through each LUN one-by-one:
grep "naa.624a937" esxcfg-scsidevs_-m.txt | awk '{print $3}' > Pure-LUNs.txt;while read f;do cat vmkfs*$f.txt |grep -e "Mode:" -e "naa.624a937";echo;done < Pure-LUNs.txt
NOTE: This command is able to be copied & pasted then used on any ESXi Host that is 5.0 and higher, as long as you are in the "commands" folder of the ESXi Host you want to verify.
Once we have the misconfigured LUNs identified the customer can use the VMware KB listed below to resolve the issue:
Link to VMware KB: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033665
Follow the steps outlined in the "Changing an ATS-only volume to public". Simply change the "0" they are setting to a "1" in the listed command they provide to turn ATS-only back on. It is important that the customer read all of the steps before continuing forward and reading the notes & caveats.
Alternatively, and also much less of a headache, the customer can simply create a new LUN from the FlashArray and mount it to the applicable ESXi Host(s). Once the new VMFS datastore is created they can verify that ATS is properly configured. Once confirmed the new datastore has ATS enabled can then migrate the Virtual Machines from the misconfigured datastore to the newly configured datastore. After everything has been moved from the old datastore and they have confirmed all is working well, they can simply destroy the old LUN. This is much easier to do and is typically what should be recommended as the first step.
If there are any questions please reach out to a fellow colleague or Support Escalations team member for assistance.