Re: [ldm-users] new and unusual problem

  • To: Patrick Finnegan <vax@xxxxxxxxxx>
  • Subject: Re: [ldm-users] new and unusual problem
  • From: Evan Breznyik <evan@xxxxxxxxxxxx>
  • Date: Tue, 16 Feb 2021 10:48:05 -0800
*Both sides would need to have jumbo (not jump).  I often cannot type :)

On Tue, Feb 16, 2021 at 10:47 AM Evan Breznyik <evan@xxxxxxxxxxxx> wrote:

> Patrick raises a plausible scenario.  (We don't move any data -- even
> across our local interfaces -- that I think is large enough to benefit from
> or require a higher MTU than 1500.  However, if this is configured in an
> unexpected manner...it can, indeed, cause issues.)
>
> Both sides would need to have jump frames enabled (but the network would
> also need to allow it across the switches/routers).  You can uncover the
> MTU from ifconfig or similar -- just make sure you're looking at the right
> interface, as many heavy duty servers have 2 or more (e.g.):
>
> evan@sanjose-ca-1:~$ ifconfig
>> enp0s25   Link encap:Ethernet  HWaddr [snip]
>>           [snip]
>>           UP BROADCAST RUNNING MULTICAST  *MTU:1500*  Metric:1
>
>
> On Tue, Feb 16, 2021 at 9:47 AM Patrick Finnegan <vax@xxxxxxxxxx> wrote:
>
>> This sounds like a link MTU size problem.  Try a "ping -s 1500" between
>> the two machines, and see if that works.
>>
>> It's likely that something is set to do jumbo frames (>1500 byte MTU) and
>> something in the middle is limiting that to the standard 1500 byte MTU
>> size.  (or something like 1500 byte MTU size on a link that has VLAN tags
>> or VPLS headers).
>>
>> Patrick Finnegan
>> Data Center Architect
>> Research Computing
>> Purdue University
>>
>>
>> On Tue, Feb 16, 2021 at 11:52 AM Karen Cooper - NOAA Affiliate via
>> ldm-users <ldm-users@xxxxxxxxxxxxxxxx> wrote:
>>
>>> tl;dr  -- LDM setup which worked fine last week, now will not
>>> send/receive any files larger than 1292 bytes.
>>>
>>> Full story:
>>> We get data via LDM from another system.   This setup/connection was
>>> working fine until last week.  As far as we know, no changes were made --
>>> but obviously something has changed, because now we can't get data.
>>>
>>> Both ends are running ldm-6.13.11, which is recent and has been working
>>> well (except for pqact issues, which don't apply here).
>>>
>>> I see connectivity at both ends, and I have restarted and rebuilt the
>>> queues on both ends multiple tiles during troubleshooting.
>>>
>>> I have enabled traffic both ways, and can ldmping and run notifyme
>>> against the other machines queue(s).
>>>
>>> Interestingly enough the issue seems to have something to do with
>>> filesize.  In my testing I tried using ldmsend to send files to the
>>> downstream server.  I have an "accept" line there, and I *AM* able to send
>>> files* IF* they are <1293 bytes.   The downstream server receives data
>>> from many other servers, and many of the files it receives are larger than
>>> 1293 bytes.
>>>
>>> Interestingly, smaller files make it through, but are taking a
>>> significantly long time.  For instance a file of 1274 bytes can take more
>>> than a minute.
>>>
>>> When trying to send the larger file, there is nothing in the downstream
>>> logs, but the upstream logs show:
>>>
>>> 20210216T163901.154847Z dontpanic.nssl.noaa.gov(feed)[20925]
>>> up6.c:up6_run:445NOTE  Starting Up(6.13.11/6): 20210216162900.110949
>>> TS_ENDT {{EXP, "/home/operator/ALAtest"}},
>>> SIG=d40ffc815fd74a96c2d7c726dc7012d3, Primary
>>> 20210216T163901.154950Z dontpanic.nssl.noaa.gov(feed)[20925]
>>> up6.c:up6_run:448NOTE  topo:  dontpanic.nssl.noaa.gov {{EXP, (.*)}}
>>> 20210216T164000.271093Z 140.172.25.37[20982]ldmd.c:cleanup:192NOTE
>>>  Exiting
>>> 20210216T164001.213937Z dontpanic.nssl.noaa.gov(feed)[20925]
>>> ldmd.c:cleanup:192NOTE  Exiting
>>>
>>> I tried setting up a second downstream system, but had the same
>>> results.
>>>
>>> I have also tried using ldmsend to send data, but again, the small files
>>> make it through, but larger packets fail.  In verbose mode for ldmsend I
>>> see:
>>>
>>> ldmsend -xxx -h dontpanic.nssl.noaa.gov ALAtestfile7
>>> 20210216T164634.300292Z ldmsend[21540]              error.c:err_log:236
>>>                 INFO  Resolving dontpanic.nssl.noaa.gov to
>>> 140.172.25.37 took 0.000755 seconds
>>> 20210216T164634.329557Z ldmsend[21540]              ldmsend.c:main:437
>>>                DEBUG version 6
>>> 20210216T164634.359151Z ldmsend[21540]
>>>  ldmsend.c:ldmsend:281               INFO  Sending ALAtestfile7, 1293 bytes
>>> 20210216T164634.359234Z ldmsend[21540]
>>>  LdmProxy.c:my_hereis_6:549          DEBUG Sending file via HEREIS_6
>>> 20210216T164734.361874Z ldmsend[21540]
>>>  LdmProxy.c:getStatus:68             ERROR NULLPROC_6 failure to host "
>>> dontpanic.nssl.noaa.gov": RPC: Unable to recei
>>> ve; errno = Connection reset by peer
>>> 20210216T164734.361940Z ldmsend[21540]
>>>  ldmsend.c:ldmsend:309               ERROR Couldn't flush connection
>>> 20210216T164734.362006Z ldmsend[21540]              ldmsend.c:cleanup:82
>>>                ERROR Message-queue isn't empty
>>>
>>>
>>>
>>> --
>>> *"Outside of a dog, a book is a man's best friend.  Inside of a dog,
>>> it's too dark to read."*
>>> *--Groucho Marx*
>>>
>>> -------------------------------------------
>>> Karen.Cooper@xxxxxxxx
>>>
>>> Phone#:  405-325-6456
>>> Cell:   405-834-8559
>>> National Severe Storms Laboratory
>>>
>>> _______________________________________________
>>> NOTE: All exchanges posted to Unidata maintained email lists are
>>> recorded in the Unidata inquiry tracking system and made publicly
>>> available through the web.  Users who post to any of the lists we
>>> maintain are reminded to remove any personal information that they
>>> do not want to be made public.
>>>
>>>
>>> ldm-users mailing list
>>> ldm-users@xxxxxxxxxxxxxxxx
>>> For list information or to unsubscribe,  visit:
>>> https://www.unidata.ucar.edu/mailing_lists/
>>>
>> _______________________________________________
>> NOTE: All exchanges posted to Unidata maintained email lists are
>> recorded in the Unidata inquiry tracking system and made publicly
>> available through the web.  Users who post to any of the lists we
>> maintain are reminded to remove any personal information that they
>> do not want to be made public.
>>
>>
>> ldm-users mailing list
>> ldm-users@xxxxxxxxxxxxxxxx
>> For list information or to unsubscribe,  visit:
>> https://www.unidata.ucar.edu/mailing_lists/
>>
>
  • 2021 messages navigation, sorted by:
    1. Thread
    2. Subject
    3. Author
    4. Date
    5. ↑ Table Of Contents
  • Search the ldm-users archives: